Important database upgrade for tracker performance

Hi DHIS2 implementers!

We are making a change in the database schema for DHIS2. This change is triggered by the need to improve the performance of tracked entity attribute value lookup. Our changes consist of three parts:

  1. The column value of the table trackedentityattributevalue will be changed from TEXT to VARCHAR(1200).
  2. A new btree index, in_trackedentity_attribute_value, will be added to the same table.
  3. The tracker API will now validate that all values are less than 1200 characters.

The upgrade will attempt to make these changes, but if the changes fail, we will log it, but proceed to start up DHIS2. This means your instance can end up in a state where the data type varchar(1200) of ‘value’ and the index is not in place. Postgres seemingly treats TEXT and varchar data types very similar, so there should be no immediate problems with this. However, we urge everyone to make sure that these changes are in effect after upgrading, and manually changing it if the upgrade fails.

If the upgrade fails an entry will be made in the logs, including the reason why it failed. Most likely the ‘value’ column contained values larger than 1200 characters. To fix this, these values needs to be removed manually.

So far, all values we have seen in production over 1200 character long, are invalid data, entered by mistake. If your data is longer than 1200 characters and should not be removed, please let us know.

The benefits of this change will be seen in all operations requiring lookup of tracked entity attribute values. This includes importing data with unique tracked entity attributes and searching by tracked entity attributes.

Summary

If your instance fails to perform the database upgrade and is unable to start, manual intervention is required. Remove any values longer than 1200 characters, and start DHIS2 again.

Important: Ensure you make a backup of your database prior to making these changes.

Useful SQL statements

The following SQL snippets are relevant for this issue.

-- List all values that exceeds 1200 characters
SELECT value FROM trackedentityattributevalue WHERE length(value) > 1200;

-- Delete all values that exceeds 1200 characters (be careful)
DELETE FROM trackedentityattributevalue WHERE length(value) > 1200;

-- Change the data type (done by DHIS2 during upgrade)
ALTER TABLE trackedentityattributevalue 
ALTER COLUMN value SET DATA TYPE VARCHAR(1200);

-- Adding the btree index (Done by DHIS2 during upgrade)
CREATE INDEX in_trackedentity_attribute_value 
ON trackedentityattributevalue 
USING btree (trackedentityattributeid, LOWER(value));

These SQL snippets can also be found on Github.

If you have any questions, feel free to reply to this thread.

4 Likes

Just seen this @Stian . For which DHIS2 version does this performance upgrade apply?

Hi @banga

I will update this thread with the versions, when they are backported. We are planning backport this to future versions of 2.32, 2.33 and 2.34 for now.

1 Like