Info on GIS development

jason · 26 July 2010 12:31

Hi Johan,

I would beg to differ. One only needs to look at most of the data
contained in some spatial data sets to see why we need to worry about
topology. For instance, polygons should not overlap each other, gaps
should not exist between polygons, etc, etc.. Although these are not
"advanced topological" issues, they are critical for presenting data
to users that makes sense. The FAO GAUL project has invested
significant effort in the creation of a topologically intact dataset,
specifically to allow spatial analyses to take place.Routing
applications in some countries, such as the one that I am in at the
moment, are increasingly being considered. This implies that points
(health facilities) must have some type of topological relationship
with another layer (such as roads) for routing and optimization
analyses to take place. It would not make sense to have DHIS messing
with data that someone has spent a lot of time establishing topology
in, unless it is required for presentation purposes. Again, my point
is this is perfectly valid, as long as we are not using DHIS as a
spatial repository, but rather as a presentation mechanism for
thematic maps.

The answer to your question of course is going to depend on how the
data (which is imported into DHIS) is processed. If we allow the
application to start truncating digits, we run the risk of perturbing
the answer. I recall performing an analysis some years ago related to
refugee camps. It happens to be that refugee camps are often located
very close to borders. Due to the imprecise topology (and precision)
of the dataset that were using, there were differences between the
number of refugee camps located in a particular country when
determined through a data field, as compared to when they were
determined spatially. My point is that topology matters a lot, and any
simplification that is performed by DHIS should not be done so without
the users actually knowing what is happening. My reccomendation them
that we (meaning the DHIS2 community) should provide a clear workflow
about how to prep data that is suitable for DHIS2, keeping all of
these points and other considerations (such as browser payload) in
mind.

However, the initial subject of this email really has to do with
GDAL/ogr2ogr appending additional decimals to coordinates, that do not
exist in the original file. After having chatted with Bob, this seems
to be a potential issue with ogr2ogr itself. I totally agree that in
most cases the removal of unnecessary decimal places is not going to
be an issue for the vast majority of the data that we are dealing
with. However, I also feel that users should not be compelled to
perform these manipulations of the data if they choose not to. We
should instead figure out a way to produce the data that is required
by DHIS2 by documenting it with a set of open source tools, and then
allow people to choose other methods if this suits their needs, which
I think it possible but needs further investigation.

Best regards,
Jason

···

On Mon, Jul 26, 2010 at 5:13 PM, Lemarchand, Johan <lemarchandjo@who.int> wrote:

Hi Jason,

I am not aware of very advanced topological relationships when GIS is used for Public Health. I have seen much more advanced implementation in GIS solutions for telecommunications or utility.
In the context of the DHIS, the main spatial relationship I can think of is to answer the question how many health facilities are contained within this district?

The generalization which is currently performed simplifies a lot the data and I don't think it will be possible to preserve the topology using the same parameters. If you use a better generalizing algorithm which will keep relationships among spatial objects, the size of the geoJSON will be much bigger. At this point, I don't think it is relevant to talk about topological relationships or scale when the original layers are generalized with the currently specified tolerance.

Johan

-----Original Message-----
From: Jason Pickering [mailto:jason.p.pickering@gmail.com]
Sent: 26 July 2010 11:53
To: Lemarchand, Johan
Cc: Knut Staring; Bob Jolliffe; dhis2-devs@lists.launchpad.net
Subject: Re: [Dhis2-devs] Info on GIS development

Hi Johan and Bob,

Johan, you are indeed correct that the generalization process may
remove the "cartographic intricacies", but this is very likely because
the generalization is performed either on geographical data where
there is no topological relationships between objects, or the
generalization process does not respect the topology when it is
performed.

It would be possible to generalize a given set of polygons without
affecting their intrinsic topological relationships, but much more
care needs to be exercised when the generalization is performed. This
generalization could take place by removing unnecessary points
(simplification) and/or by reducing the precision of the data.

Ultimately the point in doing this is to decrease the "bulk" of the
data that is presented to the client. I can imagine that a data set of
100 points with 15 decimals would behave more or less the same as a
dataset of 1000 points with 6 decimals (just guessing here). My point
is that there is a certain payload associated with each dataset.
Typically, server side processing in the form of processing the GIS
layer to an image would be employed. However, since we are using
vector data on the client side, the data should be preprocessed in
order to preserve these cartographic details that are important, as
automated simplification routines normally do not handle this. The
result being that the payload of the layer has been decreased to a
point that is "acceptable" to users. I am sceptical about whether this
step will be possible to automate at all for reasonably complex
polgyon layers (i.e districts) that DHIS typically deals with.

I want to come back to the use of of DHIS as a repository. At this
point, IMHO, DHIS seems to be not appropriate for a health facility
repository. There is no way to adjust the metadata of a given
organizational unit object easily. I suppose we could use things like
orgunit groups to provide some type of metadata, but for instance, we
may want each orgunit to have a property such as "Address", "Fax" or
"Elevation". Additionally the proposed clipping of precision further
complicates matters in this regard. Ultimatley, we want a quick
responsive map for users as the first priority, and we should set our
sites on this.

In summary, I think that the current approach that we have, namely a
recommended workflow of how to preprocess a given set of data should
not be supplanted by the system itself truncating precision of
coordinates. There are many different generalization algorithms, each
with their pros and cons. Additionally, the generalization is highly
dependent on the scale of the map, and ultimately the pixel size of
the users screen, implying that different datasets may need to be
generalized in different ways depending on their scale. A gory detail
of how this done by Geoserver (using Geotools) is here
http://docs.geoserver.org/stable/en/user/tutorials/feature-pregeneralized/feature-pregeneralized_tutorial.html\.
We certainly do not need to recreate GeoTools or Geoserver, as they
are very good already at what they do. I would say instead that we
should consider leveraging these tools instead, and letting them
decide how to generalize or not generalize features, depending on the
scale of the map that is requested by the users. I guess I am
expressing some fundamental gripe, that we should not baby users too
much. If people want to have 15 decimals, well let them. They may have
reasons for this. It obviously does not make much sense, any more than
using 50,000 points to represent a simple polygon that could be
represented with four vertices. In both cases, the GIS guys need to do
their work and understand what type of data is required by the client.
Providing clear recommendations for a workflow coupled with
guidelines on what a "reasonable" payload to the browser would be,
e.g. 30kb versus 30MB for a given layer, would be the best way to go I
think.

Regards,
Jason

On Mon, Jul 26, 2010 at 2:11 PM, Lemarchand, Johan <lemarchandjo@who.int> wrote:

Hi all,

The number of decimals is not really the issue. If you use 6 decimals, it is already enough for the type of GIS application we are interested in. The use of 15 decimals will not change a lot the precision of your map and it is not really necessary.
0 decimal places = approx. 112 km (70 miles) (Precision depending on the latitude)
3 decimal places = approx 111 m (365 feet)
6 decimal places = < 0.3 m (< 1 foot)

The maps used by the system are not that accurate anyway to be more precise than 6 decimal places because there are not very large scale maps (1:1 000 or 1:500). There are medium scale maps 1:50 000 or 1:100 000 or small scale maps.

The issue is more the cartographic generalization and the fact that it is not preserving all intricate geographical or other cartographic details. It is necessary to run the generalization process in order to use the GeoJSON format, but it removes a lot of data and simplifies it as well. As a significant amount of data is lost in the process, the output files are not relevant regarding purpose and scale and the simplified GeoJSON files can't really be used in a GIS.

Johan

-----Original Message-----
From: dhis2-devs-bounces+lemarchandjo=who.int@lists.launchpad.net [mailto:dhis2-devs-bounces+lemarchandjo=who.int@lists.launchpad.net] On Behalf Of Knut Staring
Sent: 26 July 2010 10:17
To: Bob Jolliffe
Cc: dhis2-devs@lists.launchpad.net
Subject: Re: [Dhis2-devs] Info on GIS development

On Mon, Jul 26, 2010 at 9:38 AM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

Hi Jason

On 26 July 2010 04:49, Jason Pickering <jason.p.pickering@gmail.com> wrote:

Hi Knut,

It may be that we want to use DHIS as both a repository with full
precision (though not ridiculously artifical ones like 15 decimal
lat/lon) and have a faster way of renderin. But for a repo, I think
something like PostGIS is in order. Or we could just store things as
GML...

Well, this is really the issue. If DHIS is going to be a repository,
any self-respecting GIS geek would not use it if the application
clipped precision. Although a few meters is not significant in terms
of rendering a map, it may cause havoc on certain datasets,
particularly if there are topological relationships between different
layers. If a facility is related topologically to a road network, and
the point is shifted a few meters, this may result in disturbance of
the topology between these layers, rendering DHIS useless as a
repository. ogr2ogr is perfectly OK as long as we are not dealing with
these types of layers, but as soon as we start to think about
relationships to other layers, we need to be very careful about how
the data is preprocessed.

Would you suggest then that the best place to clip precision would be
when the data is retrieved from the database for the specific view/map
rendering, rather than prior to it being stored?

This would render the current convenience of storing as a geojson
string redundant as we would need to process the string on checkout
anyway.

Can anyone say what the precision is on the shapefiles prior to
ogr2ogr conversion ie. are we introducing a new level of precision
here or is that 15 digit precision the precision of the source
shapefiles?

Quoting myself:

"Here is a comparison of what I get in GeoJSON vs GML (converting from the same
shapefile):
GeoJSON: 38.415412, 1.750212
GML: 38.415411724082148,1.750212388592194"

Both using ogr2ogr. So 6 vs 15 decimals.

Knut

Bob

We should be very conscient of not pushing the new, very simple
solution too far, for more complex functionality we should rather
employ Geoserver and PostGIS - and I still think this is the best
solution for a national repository. Our new way of storing orgunit
boundaries is a very small subset of such a full blown GIS solution,
but has the advantage of being simple, lightweight and portable.

Agreed on both points, namely that the solution is lightweight and
aimed at thematic mapping but other solutions would be more
appropriate for use as a repository of GIS data.

Regards,
Jason

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+17069260025

--
Cheers,
Knut Staring

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+17069260025

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+17069260025

Lemarchand_Johan · 26 July 2010 13:05

Hi Jason,

I think we agree on most of the points. I also consider that topology matters a lot since spatial relations are the basis of many queries that GIS perform. My point is to say that the current generalization process is not adapted for this type of spatial queries. We will need to find a better solution which should be a compromise between browser payload and the topology integrity. It can be tested and verified with a sample dataset and then documented as you suggest.

Regards,
Johan

···

-----Original Message-----
From: Jason Pickering [mailto:jason.p.pickering@gmail.com]
Sent: 26 July 2010 14:31
To: Lemarchand, Johan
Cc: Knut Staring; Bob Jolliffe; dhis2-devs@lists.launchpad.net
Subject: Re: [Dhis2-devs] Info on GIS development

Hi Johan,

I would beg to differ. One only needs to look at most of the data
contained in some spatial data sets to see why we need to worry about
topology. For instance, polygons should not overlap each other, gaps
should not exist between polygons, etc, etc.. Although these are not
"advanced topological" issues, they are critical for presenting data
to users that makes sense. The FAO GAUL project has invested
significant effort in the creation of a topologically intact dataset,
specifically to allow spatial analyses to take place.Routing
applications in some countries, such as the one that I am in at the
moment, are increasingly being considered. This implies that points
(health facilities) must have some type of topological relationship
with another layer (such as roads) for routing and optimization
analyses to take place. It would not make sense to have DHIS messing
with data that someone has spent a lot of time establishing topology
in, unless it is required for presentation purposes. Again, my point
is this is perfectly valid, as long as we are not using DHIS as a
spatial repository, but rather as a presentation mechanism for
thematic maps.

The answer to your question of course is going to depend on how the
data (which is imported into DHIS) is processed. If we allow the
application to start truncating digits, we run the risk of perturbing
the answer. I recall performing an analysis some years ago related to
refugee camps. It happens to be that refugee camps are often located
very close to borders. Due to the imprecise topology (and precision)
of the dataset that were using, there were differences between the
number of refugee camps located in a particular country when
determined through a data field, as compared to when they were
determined spatially. My point is that topology matters a lot, and any
simplification that is performed by DHIS should not be done so without
the users actually knowing what is happening. My reccomendation them
that we (meaning the DHIS2 community) should provide a clear workflow
about how to prep data that is suitable for DHIS2, keeping all of
these points and other considerations (such as browser payload) in
mind.

However, the initial subject of this email really has to do with
GDAL/ogr2ogr appending additional decimals to coordinates, that do not
exist in the original file. After having chatted with Bob, this seems
to be a potential issue with ogr2ogr itself. I totally agree that in
most cases the removal of unnecessary decimal places is not going to
be an issue for the vast majority of the data that we are dealing
with. However, I also feel that users should not be compelled to
perform these manipulations of the data if they choose not to. We
should instead figure out a way to produce the data that is required
by DHIS2 by documenting it with a set of open source tools, and then
allow people to choose other methods if this suits their needs, which
I think it possible but needs further investigation.

Best regards,
Jason

On Mon, Jul 26, 2010 at 5:13 PM, Lemarchand, Johan <lemarchandjo@who.int> wrote:

Hi Jason,

I am not aware of very advanced topological relationships when GIS is used for Public Health. I have seen much more advanced implementation in GIS solutions for telecommunications or utility.
In the context of the DHIS, the main spatial relationship I can think of is to answer the question how many health facilities are contained within this district?

The generalization which is currently performed simplifies a lot the data and I don't think it will be possible to preserve the topology using the same parameters. If you use a better generalizing algorithm which will keep relationships among spatial objects, the size of the geoJSON will be much bigger. At this point, I don't think it is relevant to talk about topological relationships or scale when the original layers are generalized with the currently specified tolerance.

Johan

-----Original Message-----
From: Jason Pickering [mailto:jason.p.pickering@gmail.com]
Sent: 26 July 2010 11:53
To: Lemarchand, Johan
Cc: Knut Staring; Bob Jolliffe; dhis2-devs@lists.launchpad.net
Subject: Re: [Dhis2-devs] Info on GIS development

Hi Johan and Bob,

Johan, you are indeed correct that the generalization process may
remove the "cartographic intricacies", but this is very likely because
the generalization is performed either on geographical data where
there is no topological relationships between objects, or the
generalization process does not respect the topology when it is
performed.

It would be possible to generalize a given set of polygons without
affecting their intrinsic topological relationships, but much more
care needs to be exercised when the generalization is performed. This
generalization could take place by removing unnecessary points
(simplification) and/or by reducing the precision of the data.

Ultimately the point in doing this is to decrease the "bulk" of the
data that is presented to the client. I can imagine that a data set of
100 points with 15 decimals would behave more or less the same as a
dataset of 1000 points with 6 decimals (just guessing here). My point
is that there is a certain payload associated with each dataset.
Typically, server side processing in the form of processing the GIS
layer to an image would be employed. However, since we are using
vector data on the client side, the data should be preprocessed in
order to preserve these cartographic details that are important, as
automated simplification routines normally do not handle this. The
result being that the payload of the layer has been decreased to a
point that is "acceptable" to users. I am sceptical about whether this
step will be possible to automate at all for reasonably complex
polgyon layers (i.e districts) that DHIS typically deals with.

I want to come back to the use of of DHIS as a repository. At this
point, IMHO, DHIS seems to be not appropriate for a health facility
repository. There is no way to adjust the metadata of a given
organizational unit object easily. I suppose we could use things like
orgunit groups to provide some type of metadata, but for instance, we
may want each orgunit to have a property such as "Address", "Fax" or
"Elevation". Additionally the proposed clipping of precision further
complicates matters in this regard. Ultimatley, we want a quick
responsive map for users as the first priority, and we should set our
sites on this.

In summary, I think that the current approach that we have, namely a
recommended workflow of how to preprocess a given set of data should
not be supplanted by the system itself truncating precision of
coordinates. There are many different generalization algorithms, each
with their pros and cons. Additionally, the generalization is highly
dependent on the scale of the map, and ultimately the pixel size of
the users screen, implying that different datasets may need to be
generalized in different ways depending on their scale. A gory detail
of how this done by Geoserver (using Geotools) is here
http://docs.geoserver.org/stable/en/user/tutorials/feature-pregeneralized/feature-pregeneralized_tutorial.html\.
We certainly do not need to recreate GeoTools or Geoserver, as they
are very good already at what they do. I would say instead that we
should consider leveraging these tools instead, and letting them
decide how to generalize or not generalize features, depending on the
scale of the map that is requested by the users. I guess I am
expressing some fundamental gripe, that we should not baby users too
much. If people want to have 15 decimals, well let them. They may have
reasons for this. It obviously does not make much sense, any more than
using 50,000 points to represent a simple polygon that could be
represented with four vertices. In both cases, the GIS guys need to do
their work and understand what type of data is required by the client.
Providing clear recommendations for a workflow coupled with
guidelines on what a "reasonable" payload to the browser would be,
e.g. 30kb versus 30MB for a given layer, would be the best way to go I
think.

Regards,
Jason

On Mon, Jul 26, 2010 at 2:11 PM, Lemarchand, Johan <lemarchandjo@who.int> wrote:

Hi all,

The number of decimals is not really the issue. If you use 6 decimals, it is already enough for the type of GIS application we are interested in. The use of 15 decimals will not change a lot the precision of your map and it is not really necessary.
0 decimal places = approx. 112 km (70 miles) (Precision depending on the latitude)
3 decimal places = approx 111 m (365 feet)
6 decimal places = < 0.3 m (< 1 foot)

The maps used by the system are not that accurate anyway to be more precise than 6 decimal places because there are not very large scale maps (1:1 000 or 1:500). There are medium scale maps 1:50 000 or 1:100 000 or small scale maps.

The issue is more the cartographic generalization and the fact that it is not preserving all intricate geographical or other cartographic details. It is necessary to run the generalization process in order to use the GeoJSON format, but it removes a lot of data and simplifies it as well. As a significant amount of data is lost in the process, the output files are not relevant regarding purpose and scale and the simplified GeoJSON files can't really be used in a GIS.

Johan

-----Original Message-----
From: dhis2-devs-bounces+lemarchandjo=who.int@lists.launchpad.net [mailto:dhis2-devs-bounces+lemarchandjo=who.int@lists.launchpad.net] On Behalf Of Knut Staring
Sent: 26 July 2010 10:17
To: Bob Jolliffe
Cc: dhis2-devs@lists.launchpad.net
Subject: Re: [Dhis2-devs] Info on GIS development

On Mon, Jul 26, 2010 at 9:38 AM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

Hi Jason

On 26 July 2010 04:49, Jason Pickering <jason.p.pickering@gmail.com> wrote:

Hi Knut,

It may be that we want to use DHIS as both a repository with full
precision (though not ridiculously artifical ones like 15 decimal
lat/lon) and have a faster way of renderin. But for a repo, I think
something like PostGIS is in order. Or we could just store things as
GML...

Well, this is really the issue. If DHIS is going to be a repository,
any self-respecting GIS geek would not use it if the application
clipped precision. Although a few meters is not significant in terms
of rendering a map, it may cause havoc on certain datasets,
particularly if there are topological relationships between different
layers. If a facility is related topologically to a road network, and
the point is shifted a few meters, this may result in disturbance of
the topology between these layers, rendering DHIS useless as a
repository. ogr2ogr is perfectly OK as long as we are not dealing with
these types of layers, but as soon as we start to think about
relationships to other layers, we need to be very careful about how
the data is preprocessed.

Would you suggest then that the best place to clip precision would be
when the data is retrieved from the database for the specific view/map
rendering, rather than prior to it being stored?

This would render the current convenience of storing as a geojson
string redundant as we would need to process the string on checkout
anyway.

Can anyone say what the precision is on the shapefiles prior to
ogr2ogr conversion ie. are we introducing a new level of precision
here or is that 15 digit precision the precision of the source
shapefiles?

Quoting myself:

"Here is a comparison of what I get in GeoJSON vs GML (converting from the same
shapefile):
GeoJSON: 38.415412, 1.750212
GML: 38.415411724082148,1.750212388592194"

Both using ogr2ogr. So 6 vs 15 decimals.

Knut

Bob

We should be very conscient of not pushing the new, very simple
solution too far, for more complex functionality we should rather
employ Geoserver and PostGIS - and I still think this is the best
solution for a national repository. Our new way of storing orgunit
boundaries is a very small subset of such a full blown GIS solution,
but has the advantage of being simple, lightweight and portable.

Agreed on both points, namely that the solution is lightweight and
aimed at thematic mapping but other solutions would be more
appropriate for use as a repository of GIS data.

Regards,
Jason

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+17069260025

--
Cheers,
Knut Staring

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+17069260025

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+17069260025

Ime_Asangansi · 26 July 2010 21:21

Hi good people,

Have you guys noticed that dhis-web doesn't build unless you change the packaging of dhis-web-excel-reporting from jar to war?
Or was it my environment behaving?
Well, I changed it and I moved on...

Ime

Jo_Storset · 27 July 2010 10:25

There has been a problem with people accidentally checking in this change, for some reason of how they deploy when developing..

Most people won't notice because there will be an older war-version of this module in their maven repo, so I guess it's been a while since your last build

The guilty party seems to have been

revno: 1868
committer: Tran Thanh Tri <Tran Thanh Tri@compaq>

Please, check what you commit.

Jo

···

Den 26. juli 2010 kl. 23.21 skrev Ime Asangansi:

Hi good people,

Have you guys noticed that dhis-web doesn't build unless you change the packaging of dhis-web-excel-reporting from jar to war?
Or was it my environment behaving?
Well, I changed it and I moved on...

Ime_Asangansi · 27 July 2010 15:01

lol...
I see...I'm on a new system, so its a fresh build.
I didn't check in that one-letter change (so please someone else should).

I get to change systems much these days so I'm wondering whats the best way of handling commits with all these authentication stuff (tired of setting up my SSH so often...unlike in other repos where all you need is your password :))

any alternatives

thanks

Ime

···

--- On Tue, 7/27/10, Jo Størset <storset@gmail.com> wrote:

From: Jo Størset <storset@gmail.com>
Subject: Re: [Dhis2-devs] dhis-web-excel-reporting behaving
To: "Ime Asangansi" <asangansi@yahoo.com>
Cc: dhis2-devs@lists.launchpad.net
Date: Tuesday, July 27, 2010, 12:25 PM

Den 26. juli 2010 kl. 23.21 skrev Ime Asangansi:

> Hi good people,
>
> Have you guys noticed that dhis-web doesn't build
unless you change the packaging of dhis-web-excel-reporting
from jar to war?
> Or was it my environment behaving?
> Well, I changed it and I moved on...

There has been a problem with people accidentally checking
in this change, for some reason of how they deploy when
developing..

Most people won't notice because there will be an older
war-version of this module in their maven repo, so I guess
it's been a while since your last build

The guilty party seems to have been

revno: 1868
committer: Tran Thanh Tri <Tran Thanh Tri@compaq>

Please, check what you commit.

Jo