DHIS 2.11 upgrade tips for server admins

Hi,

this update is intended for server administrators looking to upgrade to DHIS 2.11.

In version 2.11, DHIS is performing on-the-fly data aggregation as part of the new analytics API. This is in contrast to the datamart API where data is been pre-aggregated for a set of dimensions. The analytics tables contain indexed columns for each data dimension together with a column for the “raw” data values. The tables are partitioned for each year of data. Partition means a part of a table which has been split horizontally on rows - you will find tables like analytics_2011, analytics_2012, analytics_2013 in the database. This has some consequences for server configuration:

  1. It is strongly recommended to set up a cache server to reduce the pressure on your web servers. First, caching must be enabled in DHIS from “System general settings” by setting “Cache strategy” to “Cache until 6 am tomorrow”. The simplest way to set up a server-side cache if you already have a reverse proxy like nginx and apache running is to configure it directly in the proxy. Enabling caching for nginx is documented here:

http://dhis2.org/doc/snapshot/en/implementer/html/ch08s02.html#d5e580

Apache caching can be done through the mod_cache module: http://httpd.apache.org/docs/2.2/mod/mod_cache.html

Otherwise, dedicated caches like varnish (https://www.varnish-cache.org/) and squid (http://www.squid-cache.org/) could be set up in front of your web servers.

  1. Run latest version of PostgreSQL which is 9.2. It has some nice performance improvements over earlier versions.

  2. Make sure a lot of memory is available to the database / PostgreSQL. If the analytics tables for the last 2-3 years can fit in the memory cache then on-the-fly aggregation will perform much better. You can see the size of the analytics tables and indexes using this SQL on PostgreSQL:

select pg_size_pretty( pg_total_relation_size( ‘analytics_2013’ ) );

The more memory you have the better. Documentation has been updated, have a look at the values for shared_buffers and effective_cache_size settings and update your configurations adjusted for your available memory:

http://dhis2.org/doc/snapshot/en/implementer/html/ch08.html#d5e413

he size of the database will increase with the analytics tables. Also, when doing database backups it’s not required to include the analytics nor the datamart tables. An efficient command for creating backups which omits these tables and compresses the output in one go (replace db-name and db-user) is:

  1. Make sure you have enough disk space as t****

pg_dump db-name -U db-user -T aggregated* -T analytics* -T completeness* | gzip > dhis-pgdump.gz

  1. Optional. Use a fast hard disk type/drive. It will help performance both on analytics table generation and read queries if you have a fast disk type, ideally SSD.

regards,

Lars

···

Hi Lars (and any other nginx gurus out there)

I have generally favoured apache2 in the past but I am working on a
standardized install deb so it makes sense to do this with nginx. So
you can welcome me into the nginx fold :slight_smile:

Our user manual currently describes the case of a single dhis instance
proxied behind an nginx frontend. I am looking for an opinion on the
best way to go about this for the very common scenario of multiple
dhis instances being proxied behind a single nginx server. Setting
up proxy caching is not a problem, though users should be made aware
that the normal application security is being bypassed for these
cached resources (reports, charts etc). What I am a bit puzzled about
is the best way to set about serving static content.

Take the following, where there are 2 locations being proxied:

location /dhims {
      proxy_pass http://localhost:8080;
      proxy_redirect off;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto https;
      proxy_cache dhis; }

location /training {
      proxy_pass http://localhost:8081;
      proxy_redirect off;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto https;
      proxy_cache dhis; }

I can't in this case just set up a simple document root for the static
requests because I have two webapps.

I guess serving static documents directly is out of the question
anyway in some cases, eg Rwanda, where the tomcat backends are not all
necessarily on the same filesystem as the reverse proxy.

In most cases I can think of where there are multiple tomcat
instances, there is usually one main production instance and others
are used for training, demo, staging etc. Approaches I am
considering:
1. One approach would be to just optimize for the production server.
2. Another would be to setup a document root and symlink the
different webapps as virtual directories below that (potentially
complicated).
3. Create multiple location blocks, one for the static content of
each backend (at least the ones on the same filesystem)
4. Try and treat serving static files as just a special case of caching.

The last option seems the most attractive to me (zero config) if we
are going to enable caching anyway. But how can it be made work? On
apache2 with the mod_cache module I think this would happen out of the
box ie. apache will default to cache all cacheable content which would
include the js, css etc.

I wonder is anything required to be done at all with nginx. Will test
and see. I puzzle I foresee is how to get the "Cache-Control: public"
headers onto these static responses so the browsers can cache them
over ssl. Is this something we should be addressing at the level of
struts interceptors rather than getting the reverse proxy to add
headers?

Bob

···

On 24 March 2013 19:56, Lars Helge Øverland <larshelge@gmail.com> wrote:

Hi,

this update is intended for server administrators looking to upgrade to DHIS 2.11.

In version 2.11, DHIS is performing on-the-fly data aggregation as part of the new analytics API. This is in contrast to the datamart API where data is been pre-aggregated for a set of dimensions. The analytics tables contain indexed columns for each data dimension together with a column for the "raw" data values. The tables are partitioned for each year of data. Partition means a part of a table which has been split horizontally on rows - you will find tables like analytics_2011, analytics_2012, analytics_2013 in the database. This has some consequences for server configuration:

1) It is strongly recommended to set up a cache server to reduce the pressure on your web servers. First, caching must be enabled in DHIS from "System general settings" by setting "Cache strategy" to "Cache until 6 am tomorrow". The simplest way to set up a server-side cache if you already have a reverse proxy like nginx and apache running is to configure it directly in the proxy. Enabling caching for nginx is documented here:

http://dhis2.org/doc/snapshot/en/implementer/html/ch08s02.html#d5e580

Apache caching can be done through the mod_cache module: mod_cache - Apache HTTP Server Version 2.2

Otherwise, dedicated caches like varnish (https://www.varnish-cache.org/\) and squid (http://www.squid-cache.org/\) could be set up in front of your web servers.

2) Run latest version of PostgreSQL which is 9.2. It has some nice performance improvements over earlier versions.

3) Make sure a lot of memory is available to the database / PostgreSQL. If the analytics tables for the last 2-3 years can fit in the memory cache then on-the-fly aggregation will perform much better. You can see the size of the analytics tables and indexes using this SQL on PostgreSQL:

select pg_size_pretty( pg_total_relation_size( 'analytics_2013' ) );

The more memory you have the better. Documentation has been updated, have a look at the values for shared_buffers and effective_cache_size settings and update your configurations adjusted for your available memory:

http://dhis2.org/doc/snapshot/en/implementer/html/ch08.html#d5e413

4) Make sure you have enough disk space as t
he size of the database will increase with the analytics tables. Also, when doing database backups it's not required to include the analytics nor the datamart tables. An efficient command for creating backups which omits these tables and compresses the output in one go (replace db-name and db-user) is:

pg_dump db-name -U db-user -T aggregated* -T analytics* -T completeness* | gzip > dhis-pgdump.gz

5) Optional. Use a fast hard disk type/drive. It will help performance both on analytics table generation and read queries if you have a fast disk type, ideally SSD.

regards,

Lars

_______________________________________________
Mailing list: DHIS 2 developers in Launchpad
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : DHIS 2 developers in Launchpad
More help : ListHelp - Launchpad Help

Hi Bob,

thanks for bringing this up.

Personally I have been using your alternative 3) when having multiple instances. It works well and is easy to set up. I see your point about different proxy/container servers, and I agree it might be a bit more elegant to let the nginx cache handle static files. I think it should be a matter of setting up a location block matching static content with directives something like

add_header Cache-Control public;

expires 7d;

proxy_cache dhis;

proxy_pass localhost:8080;

We could add these headers in DHIS as well - potentially with a system setting (“cache strategy for static content”) to cater for special requirements.

I guess one just have to test it…

regards,

Lars

···

On Mon, Mar 25, 2013 at 3:16 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Hi Lars (and any other nginx gurus out there)

I have generally favoured apache2 in the past but I am working on a

standardized install deb so it makes sense to do this with nginx. So

you can welcome me into the nginx fold :slight_smile:

Our user manual currently describes the case of a single dhis instance

proxied behind an nginx frontend. I am looking for an opinion on the

best way to go about this for the very common scenario of multiple

dhis instances being proxied behind a single nginx server. Setting

up proxy caching is not a problem, though users should be made aware

that the normal application security is being bypassed for these

cached resources (reports, charts etc). What I am a bit puzzled about

is the best way to set about serving static content.

Take the following, where there are 2 locations being proxied:

location /dhims {

  proxy_pass        [http://localhost:8080](http://localhost:8080);

  proxy_redirect    off;

  proxy_set_header  Host               $host;

  proxy_set_header  X-Real-IP          $remote_addr;

  proxy_set_header  X-Forwarded-For    $proxy_add_x_forwarded_for;

  proxy_set_header  X-Forwarded-Proto  https;

  proxy_cache dhis;  }

location /training {

  proxy_pass        [http://localhost:8081](http://localhost:8081);

  proxy_redirect    off;

  proxy_set_header  Host               $host;

  proxy_set_header  X-Real-IP          $remote_addr;

  proxy_set_header  X-Forwarded-For    $proxy_add_x_forwarded_for;

  proxy_set_header  X-Forwarded-Proto  https;

  proxy_cache dhis;  }

I can’t in this case just set up a simple document root for the static

requests because I have two webapps.

I guess serving static documents directly is out of the question

anyway in some cases, eg Rwanda, where the tomcat backends are not all

necessarily on the same filesystem as the reverse proxy.

In most cases I can think of where there are multiple tomcat

instances, there is usually one main production instance and others

are used for training, demo, staging etc. Approaches I am

considering:

  1. One approach would be to just optimize for the production server.

  2. Another would be to setup a document root and symlink the

different webapps as virtual directories below that (potentially

complicated).

  1. Create multiple location blocks, one for the static content of

each backend (at least the ones on the same filesystem)

  1. Try and treat serving static files as just a special case of caching.

The last option seems the most attractive to me (zero config) if we

are going to enable caching anyway. But how can it be made work? On

apache2 with the mod_cache module I think this would happen out of the

box ie. apache will default to cache all cacheable content which would

include the js, css etc.

I wonder is anything required to be done at all with nginx. Will test

and see. I puzzle I foresee is how to get the “Cache-Control: public”

headers onto these static responses so the browsers can cache them

over ssl. Is this something we should be addressing at the level of

struts interceptors rather than getting the reverse proxy to add

headers?