dhis14 import

Just some throwaway code testing out jackcess for reading dhis14 (and
potentially modulo basico files):

http://pastebin.com/wMv1SZqq

I'm pretty impressed. It works well and I suspect also much faster
than accessing via odbc/ibatis or whatever it is. Never mind the
nonsense of what this code actually does - the point is that it can
iterate over access tables using java (on ubuntu). Kind of nice.

Cheers
Bob

That sounds really great - it has been problematic to require Windows for this.

k

···

On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

Just some throwaway code testing out jackcess for reading dhis14 (and
potentially modulo basico files):

http://pastebin.com/wMv1SZqq

I'm pretty impressed. It works well and I suspect also much faster
than accessing via odbc/ibatis or whatever it is. Never mind the
nonsense of what this code actually does - the point is that it can
iterate over access tables using java (on ubuntu). Kind of nice.

Cheers
Bob

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Cheers,
Knut Staring

I am really impressed with that news.

Caveman

···

On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe bobjolliffe@gmail.com wrote:

Just some throwaway code testing out jackcess for reading dhis14 (and

potentially modulo basico files):

http://pastebin.com/wMv1SZqq

I’m pretty impressed. It works well and I suspect also much faster

than accessing via odbc/ibatis or whatever it is. Never mind the

nonsense of what this code actually does - the point is that it can

iterate over access tables using java (on ubuntu). Kind of nice.

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : http

Impressive!
First time seeing that clean functionality!
I see potential there to move data between both systems :slight_smile:

Ime

···

--- On Tue, 7/20/10, Knut Staring <knutst@gmail.com> wrote:

From: Knut Staring <knutst@gmail.com>
Subject: Re: [Dhis2-devs] dhis14 import
To: "Bob Jolliffe" <bobjolliffe@gmail.com>
Cc: "dhis2-devs" <dhis2-devs@lists.launchpad.net>
Date: Tuesday, July 20, 2010, 3:32 PM
That sounds really great - it has
been problematic to require Windows for this.

k

On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe <bobjolliffe@gmail.com> > wrote:
> Just some throwaway code testing out jackcess for
reading dhis14 (and
> potentially modulo basico files):
>
> http://pastebin.com/wMv1SZqq
>
> I'm pretty impressed. It works well and I suspect
also much faster
> than accessing via odbc/ibatis or whatever it is.
Never mind the
> nonsense of what this code actually does - the point
is that it can
> iterate over access tables using java (on ubuntu).
Kind of nice.
>
> Cheers
> Bob
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>

--
Cheers,
Knut Staring

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

No doubt this looks much simpler.

Would be interesting to do a test with a large table (>10 mill) and see how it performs in terms of memory usage.

Lars

···

On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi asangansi@yahoo.com wrote:

Impressive!

First time seeing that clean functionality!

I see potential there to move data between both systems :slight_smile:

Ime

— On Tue, 7/20/10, Knut Staring knutst@gmail.com wrote:

From: Knut Staring knutst@gmail.com

Subject: Re: [Dhis2-devs] dhis14 import

To: “Bob Jolliffe” bobjolliffe@gmail.com

Cc: “dhis2-devs” dhis2-devs@lists.launchpad.net

Date: Tuesday, July 20, 2010, 3:32 PM

That sounds really great - it has

been problematic to require Windows for this.

k

On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe bobjolliffe@gmail.com > > > wrote:

Just some throwaway code testing out jackcess for

reading dhis14 (and

potentially modulo basico files):

http://pastebin.com/wMv1SZqq

I’m pretty impressed. It works well and I suspect

also much faster

than accessing via odbc/ibatis or whatever it is.

Never mind the

nonsense of what this code actually does - the point

is that it can

iterate over access tables using java (on ubuntu).

Kind of nice.

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Cheers,

Knut Staring


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Only tried it on 330000 records. Should be easy enough to test
something bigger.

Is the use case with dhis14 is primarily bulk import or routine? I
guess both ..

I'm guessing the challenge would be to graft a jackcess backend onto
the existing dhis14 file import stuff as a replacement for the
hibernate provider.

···

2010/7/22 Lars Helge Øverland <larshelge@gmail.com>:

No doubt this looks much simpler.
Would be interesting to do a test with a large table (>10 mill) and see how
it performs in terms of memory usage.
Lars
On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi <asangansi@yahoo.com> wrote:

Impressive!
First time seeing that clean functionality!
I see potential there to move data between both systems :slight_smile:

Ime

--- On Tue, 7/20/10, Knut Staring <knutst@gmail.com> wrote:

> From: Knut Staring <knutst@gmail.com>
> Subject: Re: [Dhis2-devs] dhis14 import
> To: "Bob Jolliffe" <bobjolliffe@gmail.com>
> Cc: "dhis2-devs" <dhis2-devs@lists.launchpad.net>
> Date: Tuesday, July 20, 2010, 3:32 PM
> That sounds really great - it has
> been problematic to require Windows for this.
>
> k
>
> On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe <bobjolliffe@gmail.com> >> > wrote:
> > Just some throwaway code testing out jackcess for
> reading dhis14 (and
> > potentially modulo basico files):
> >
> > http://pastebin.com/wMv1SZqq
> >
> > I'm pretty impressed. It works well and I suspect
> also much faster
> > than accessing via odbc/ibatis or whatever it is.
> Never mind the
> > nonsense of what this code actually does - the point
> is that it can
> > iterate over access tables using java (on ubuntu).
> Kind of nice.
> >
> > Cheers
> > Bob
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-devs
> > Post to : dhis2-devs@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~dhis2-devs
> > More help : https://help.launchpad.net/ListHelp
> >
>
>
>
> --
> Cheers,
> Knut Staring
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Only tried it on 330000 records. Should be easy enough to test

something bigger.

Nice.

Is the use case with dhis14 is primarily bulk import or routine? I

guess both …

Its mainly the bulk initial import which is done with the file import. Routine imports should be done with the XML import.

···

2010/7/22 Bob Jolliffe bobjolliffe@gmail.com

I’m guessing the challenge would be to graft a jackcess backend onto

the existing dhis14 file import stuff as a replacement for the

hibernate provider.

2010/7/22 Lars Helge Øverland larshelge@gmail.com:

No doubt this looks much simpler.

Would be interesting to do a test with a large table (>10 mill) and see how

it performs in terms of memory usage.

Lars

On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi asangansi@yahoo.com wrote:

Impressive!

First time seeing that clean functionality!

I see potential there to move data between both systems :slight_smile:

Ime

— On Tue, 7/20/10, Knut Staring knutst@gmail.com wrote:

From: Knut Staring knutst@gmail.com

Subject: Re: [Dhis2-devs] dhis14 import

To: “Bob Jolliffe” bobjolliffe@gmail.com

Cc: “dhis2-devs” dhis2-devs@lists.launchpad.net

Date: Tuesday, July 20, 2010, 3:32 PM

That sounds really great - it has

been problematic to require Windows for this.

k

On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe bobjolliffe@gmail.com > > >> > wrote:

Just some throwaway code testing out jackcess for

reading dhis14 (and

potentially modulo basico files):

http://pastebin.com/wMv1SZqq

I’m pretty impressed. It works well and I suspect

also much faster

than accessing via odbc/ibatis or whatever it is.

Never mind the

nonsense of what this code actually does - the point

is that it can

iterate over access tables using java (on ubuntu).

Kind of nice.

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Cheers,

Knut Staring


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

No doubt this looks much simpler.
Would be interesting to do a test with a large table (>10 mill) and see how
it performs in terms of memory usage.

10 million records is a lot of records! I have it whirring away in
the background as I get on with other stuff. Started 1.5 hours ago
and still writing records ... db file up to 250M ... hasn't started
reading back yet but thus far memory usage is low and constant. I'll
let you know when/if it finishes :slight_smile:

···

2010/7/22 Lars Helge Øverland <larshelge@gmail.com>:

Lars
On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi <asangansi@yahoo.com> wrote:

Impressive!
First time seeing that clean functionality!
I see potential there to move data between both systems :slight_smile:

Ime

--- On Tue, 7/20/10, Knut Staring <knutst@gmail.com> wrote:

> From: Knut Staring <knutst@gmail.com>
> Subject: Re: [Dhis2-devs] dhis14 import
> To: "Bob Jolliffe" <bobjolliffe@gmail.com>
> Cc: "dhis2-devs" <dhis2-devs@lists.launchpad.net>
> Date: Tuesday, July 20, 2010, 3:32 PM
> That sounds really great - it has
> been problematic to require Windows for this.
>
> k
>
> On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe <bobjolliffe@gmail.com> >> > wrote:
> > Just some throwaway code testing out jackcess for
> reading dhis14 (and
> > potentially modulo basico files):
> >
> > http://pastebin.com/wMv1SZqq
> >
> > I'm pretty impressed. It works well and I suspect
> also much faster
> > than accessing via odbc/ibatis or whatever it is.
> Never mind the
> > nonsense of what this code actually does - the point
> is that it can
> > iterate over access tables using java (on ubuntu).
> Kind of nice.
> >
> > Cheers
> > Bob
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-devs
> > Post to : dhis2-devs@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~dhis2-devs
> > More help : https://help.launchpad.net/ListHelp
> >
>
>
>
> --
> Cheers,
> Knut Staring
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Alright writing is pretty slow. Writing 10million records take two hours.

But reading is pretty darn quick. Iterating through 10million rows
and dumping each row to stdout (redirected to /dev/null) takes about 1
minute +- 5 seconds. Throughout which memory use is constant and
low: less than 1.5%. That's got to be quicker than anything using
jdbc which is obliged to push everything up and down the tcp/ip stack.

Not looking any more at this now - just wanted to see the numbers.
Looks like a good addition to Knut's student project list. Refactor
dhis14 file import to inject a jackcess backend in place of the
hibernate one.

Cheers
Bob

···

2010/7/23 Bob Jolliffe <bobjolliffe@gmail.com>:

2010/7/22 Lars Helge Øverland <larshelge@gmail.com>:

No doubt this looks much simpler.
Would be interesting to do a test with a large table (>10 mill) and see how
it performs in terms of memory usage.

10 million records is a lot of records! I have it whirring away in
the background as I get on with other stuff. Started 1.5 hours ago
and still writing records ... db file up to 250M ... hasn't started
reading back yet but thus far memory usage is low and constant. I'll
let you know when/if it finishes :slight_smile:

Lars
On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi <asangansi@yahoo.com> wrote:

Impressive!
First time seeing that clean functionality!
I see potential there to move data between both systems :slight_smile:

Ime

--- On Tue, 7/20/10, Knut Staring <knutst@gmail.com> wrote:

> From: Knut Staring <knutst@gmail.com>
> Subject: Re: [Dhis2-devs] dhis14 import
> To: "Bob Jolliffe" <bobjolliffe@gmail.com>
> Cc: "dhis2-devs" <dhis2-devs@lists.launchpad.net>
> Date: Tuesday, July 20, 2010, 3:32 PM
> That sounds really great - it has
> been problematic to require Windows for this.
>
> k
>
> On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe <bobjolliffe@gmail.com> >>> > wrote:
> > Just some throwaway code testing out jackcess for
> reading dhis14 (and
> > potentially modulo basico files):
> >
> > http://pastebin.com/wMv1SZqq
> >
> > I'm pretty impressed. It works well and I suspect
> also much faster
> > than accessing via odbc/ibatis or whatever it is.
> Never mind the
> > nonsense of what this code actually does - the point
> is that it can
> > iterate over access tables using java (on ubuntu).
> Kind of nice.
> >
> > Cheers
> > Bob
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-devs
> > Post to : dhis2-devs@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~dhis2-devs
> > More help : https://help.launchpad.net/ListHelp
> >
>
>
>
> --
> Cheers,
> Knut Staring
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

Hi Bob,

Mmmm..interesting.

Just to be sure.
Were you reading from 1.4 and writing to 2.0? mysql or postgres?

Were you writing the other way (to access)?

Thanks.

Ime

···

--- On Fri, 7/23/10, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

From: Bob Jolliffe <bobjolliffe@gmail.com>
Subject: Re: [Dhis2-devs] dhis14 import
To: "Lars Helge Øverland" <larshelge@gmail.com>
Cc: "Ime Asangansi" <asangansi@yahoo.com>, "Knut Staring" <knutst@gmail.com>, "dhis2-devs" <dhis2-devs@lists.launchpad.net>
Date: Friday, July 23, 2010, 3:53 PM
Alright writing is pretty slow.
Writing 10million records take two hours.

But reading is pretty darn quick. Iterating through
10million rows
and dumping each row to stdout (redirected to /dev/null)
takes about 1
minute +- 5 seconds. Throughout which
memory use is constant and
low: less than 1.5%. That's got to be quicker than
anything using
jdbc which is obliged to push everything up and down the
tcp/ip stack.

Not looking any more at this now - just wanted to see the
numbers.
Looks like a good addition to Knut's student project
list. Refactor
dhis14 file import to inject a jackcess backend in place of
the
hibernate one.

Cheers
Bob

2010/7/23 Bob Jolliffe <bobjolliffe@gmail.com>:
> 2010/7/22 Lars Helge Øverland <larshelge@gmail.com>:
>>
>> No doubt this looks much simpler.
>> Would be interesting to do a test with a large
table (>10 mill) and see how
>> it performs in terms of memory usage.
>
> 10 million records is a lot of records! I have it
whirring away in
> the background as I get on with other stuff. Started
1.5 hours ago
> and still writing records ... db file up to 250M ...
hasn't started
> reading back yet but thus far memory usage is low and
constant. I'll
> let you know when/if it finishes :slight_smile:
>
>> Lars
>> On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi > <asangansi@yahoo.com> > wrote:
>>>
>>> Impressive!
>>> First time seeing that clean functionality!
>>> I see potential there to move data between
both systems :slight_smile:
>>>
>>> Ime
>>>
>>>
>>> --- On Tue, 7/20/10, Knut Staring <knutst@gmail.com> > wrote:
>>>
>>> > From: Knut Staring <knutst@gmail.com>
>>> > Subject: Re: [Dhis2-devs] dhis14 import
>>> > To: "Bob Jolliffe" <bobjolliffe@gmail.com>
>>> > Cc: "dhis2-devs" <dhis2-devs@lists.launchpad.net>
>>> > Date: Tuesday, July 20, 2010, 3:32 PM
>>> > That sounds really great - it has
>>> > been problematic to require Windows for
this.
>>> >
>>> > k
>>> >
>>> > On Tue, Jul 20, 2010 at 3:23 PM, Bob > Jolliffe <bobjolliffe@gmail.com> > >>> > wrote:
>>> > > Just some throwaway code testing out
jackcess for
>>> > reading dhis14 (and
>>> > > potentially modulo basico files):
>>> > >
>>> > > http://pastebin.com/wMv1SZqq
>>> > >
>>> > > I'm pretty impressed. It works
well and I suspect
>>> > also much faster
>>> > > than accessing via odbc/ibatis or
whatever it is.
>>> > Never mind the
>>> > > nonsense of what this code actually
does - the point
>>> > is that it can
>>> > > iterate over access tables using
java (on ubuntu).
>>> > Kind of nice.
>>> > >
>>> > > Cheers
>>> > > Bob
>>> > >
>>> > >
_______________________________________________
>>> > > Mailing list: https://launchpad.net/~dhis2-devs
>>> > > Post to : dhis2-devs@lists.launchpad.net
>>> > > Unsubscribe : https://launchpad.net/~dhis2-devs
>>> > > More help : https://help.launchpad.net/ListHelp
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Cheers,
>>> > Knut Staring
>>> >
>>> >
_______________________________________________
>>> > Mailing list: https://launchpad.net/~dhis2-devs
>>> > Post to : dhis2-devs@lists.launchpad.net
>>> > Unsubscribe : https://launchpad.net/~dhis2-devs
>>> > More help : https://help.launchpad.net/ListHelp
>>> >
>>>
>>>
_______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to : dhis2-devs@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help : https://help.launchpad.net/ListHelp
>>
>>
>

Hi Ime

Hi Bob,

Mmmm..interesting.

Just to be sure.
Were you reading from 1.4 and writing to 2.0? mysql or postgres?

Were you writing the other way (to access)?

Nothing so grand as any of the above. Though I am reading and writing
access. Reading/writing mysql and postgres will be much more of a
bottleneck than reading jackcess 'coz of the tcp requirement of jdbc.

In this scenario I'm just running scratch code in the background to
keep my laptop warm while I try to write :slight_smile: See attached. jackcess
is very maven friendly.

mvn assembly:assembly
java -jar target/dhaccess-jar-with-dendencies.jar

Drink lots of coffee, take a walk, have lunch .... etc

Cheers.
Bob

package org.hisp.dhis;

import com.healthmarketscience.jackcess.ColumnBuilder;
import com.healthmarketscience.jackcess.DataType;
import java.io.File;
import java.io.IOException;
import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import com.healthmarketscience.jackcess.Database;
import com.healthmarketscience.jackcess.Table;
import com.healthmarketscience.jackcess.TableBuilder;

/**
* CPU warmer

dhaccess.tgz (2.4 KB)

···

On 23 July 2010 19:23, Ime Asangansi <asangansi@yahoo.com> wrote:
*
*/
public class App
{

    private static final Log log = LogFactory.getLog( App.class );

    private static long MAXROWS = 10000000;

    public static void main( String args )
    {
        try
        {
            File dbFile = new File( "/home/bobj/src/dhaccess/test.mdb" );
            Database db = Database.create( dbFile );
            log.info( "Opened " + dbFile.getName() );

            Map<String, Object> row = null;
            Table table = new TableBuilder( "test" ).addColumn( new
ColumnBuilder( "id", DataType.LONG ).toColumn() ).addColumn( new
ColumnBuilder( "value", DataType.TEXT ).toColumn() ).toTable( db );

            long i = 0;
            for ( i = 0; i < MAXROWS; i++ )
            {
                table.addRow( i, "some text" );
            }

            log.info( "Reading data" );
            int count = 0;
            while ( ( row = table.getNextRow() ) != null )
            {
                count++;
                log.debug( row );
                Integer id = (Integer) row.get( "id" );
                String value = (String) row.get( "value" );

                System.err.println( id + " : " + value );
            }
            log.info( count + " datavalues" );
        } catch ( IOException ex )
        {
            log.info("Ouch: "+ex);
        }

    }
}

Thanks.

Ime

--- On Fri, 7/23/10, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

From: Bob Jolliffe <bobjolliffe@gmail.com>
Subject: Re: [Dhis2-devs] dhis14 import
To: "Lars Helge Øverland" <larshelge@gmail.com>
Cc: "Ime Asangansi" <asangansi@yahoo.com>, "Knut Staring" <knutst@gmail.com>, "dhis2-devs" <dhis2-devs@lists.launchpad.net>
Date: Friday, July 23, 2010, 3:53 PM
Alright writing is pretty slow.
Writing 10million records take two hours.

But reading is pretty darn quick. Iterating through
10million rows
and dumping each row to stdout (redirected to /dev/null)
takes about 1
minute +- 5 seconds. Throughout which
memory use is constant and
low: less than 1.5%. That's got to be quicker than
anything using
jdbc which is obliged to push everything up and down the
tcp/ip stack.

Not looking any more at this now - just wanted to see the
numbers.
Looks like a good addition to Knut's student project
list. Refactor
dhis14 file import to inject a jackcess backend in place of
the
hibernate one.

Cheers
Bob

2010/7/23 Bob Jolliffe <bobjolliffe@gmail.com>:
> 2010/7/22 Lars Helge Øverland <larshelge@gmail.com>:
>>
>> No doubt this looks much simpler.
>> Would be interesting to do a test with a large
table (>10 mill) and see how
>> it performs in terms of memory usage.
>
> 10 million records is a lot of records! I have it
whirring away in
> the background as I get on with other stuff. Started
1.5 hours ago
> and still writing records ... db file up to 250M ...
hasn't started
> reading back yet but thus far memory usage is low and
constant. I'll
> let you know when/if it finishes :slight_smile:
>
>> Lars
>> On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi >> <asangansi@yahoo.com> >> wrote:
>>>
>>> Impressive!
>>> First time seeing that clean functionality!
>>> I see potential there to move data between
both systems :slight_smile:
>>>
>>> Ime
>>>
>>>
>>> --- On Tue, 7/20/10, Knut Staring <knutst@gmail.com> >> wrote:
>>>
>>> > From: Knut Staring <knutst@gmail.com>
>>> > Subject: Re: [Dhis2-devs] dhis14 import
>>> > To: "Bob Jolliffe" <bobjolliffe@gmail.com>
>>> > Cc: "dhis2-devs" <dhis2-devs@lists.launchpad.net>
>>> > Date: Tuesday, July 20, 2010, 3:32 PM
>>> > That sounds really great - it has
>>> > been problematic to require Windows for
this.
>>> >
>>> > k
>>> >
>>> > On Tue, Jul 20, 2010 at 3:23 PM, Bob >> Jolliffe <bobjolliffe@gmail.com> >> >>> > wrote:
>>> > > Just some throwaway code testing out
jackcess for
>>> > reading dhis14 (and
>>> > > potentially modulo basico files):
>>> > >
>>> > > http://pastebin.com/wMv1SZqq
>>> > >
>>> > > I'm pretty impressed. It works
well and I suspect
>>> > also much faster
>>> > > than accessing via odbc/ibatis or
whatever it is.
>>> > Never mind the
>>> > > nonsense of what this code actually
does - the point
>>> > is that it can
>>> > > iterate over access tables using
java (on ubuntu).
>>> > Kind of nice.
>>> > >
>>> > > Cheers
>>> > > Bob
>>> > >
>>> > >
_______________________________________________
>>> > > Mailing list: https://launchpad.net/~dhis2-devs
>>> > > Post to : dhis2-devs@lists.launchpad.net
>>> > > Unsubscribe : https://launchpad.net/~dhis2-devs
>>> > > More help : https://help.launchpad.net/ListHelp
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Cheers,
>>> > Knut Staring
>>> >
>>> >
_______________________________________________
>>> > Mailing list: https://launchpad.net/~dhis2-devs
>>> > Post to : dhis2-devs@lists.launchpad.net
>>> > Unsubscribe : https://launchpad.net/~dhis2-devs
>>> > More help : https://help.launchpad.net/ListHelp
>>> >
>>>
>>>
_______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to : dhis2-devs@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help : https://help.launchpad.net/ListHelp
>>
>>
>

Hi Bob,

I think this is indeed interesting, but is it really worth spending
any time on? We currently have something that works, and for our 10
million row database in Zambia it took a few hours to import. OK,
fine, it took time, but it was a one off deal. Is there much to be
gained from a full refactor? I think the typical use case would be a
one time bulk import, which more or less works. There are still some
hiccups, but these are more a function of the differences in the
metadata models between 1.4 and 2.0.

IMHO, what does need to work, is bidirectional syncronization with XML
between the two systems. Although it is theoretically possible to
export data from 2.0 into 1.4, the last time i checked a few weeks
back, it does not work. This funtionality, for me at least, is much
much more important than simply being able to import DHIS 1.4 data
without windows (which I would need in the first place to create 1.4
data!).

Not to be a party pooper or anything, but just trying to give some perspective.

Regards,
Jason

···

On Fri, Jul 23, 2010 at 10:05 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

Hi Ime

On 23 July 2010 19:23, Ime Asangansi <asangansi@yahoo.com> wrote:

Hi Bob,

Mmmm..interesting.

Just to be sure.
Were you reading from 1.4 and writing to 2.0? mysql or postgres?

Were you writing the other way (to access)?

Nothing so grand as any of the above. Though I am reading and writing
access. Reading/writing mysql and postgres will be much more of a
bottleneck than reading jackcess 'coz of the tcp requirement of jdbc.

In this scenario I'm just running scratch code in the background to
keep my laptop warm while I try to write :slight_smile: See attached. jackcess
is very maven friendly.

mvn assembly:assembly
java -jar target/dhaccess-jar-with-dendencies.jar

Drink lots of coffee, take a walk, have lunch .... etc

Cheers.
Bob

package org.hisp.dhis;

import com.healthmarketscience.jackcess.ColumnBuilder;
import com.healthmarketscience.jackcess.DataType;
import java.io.File;
import java.io.IOException;
import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import com.healthmarketscience.jackcess.Database;
import com.healthmarketscience.jackcess.Table;
import com.healthmarketscience.jackcess.TableBuilder;

/**
* CPU warmer
*
*/
public class App
{

private static final Log log = LogFactory.getLog( App.class );

private static long MAXROWS = 10000000;

public static void main( String args )
{
try
{
File dbFile = new File( "/home/bobj/src/dhaccess/test.mdb" );
Database db = Database.create( dbFile );
log.info( "Opened " + dbFile.getName() );

       Map&lt;String, Object&gt; row = null;
       Table table = new TableBuilder\( &quot;test&quot; \)\.addColumn\( new

ColumnBuilder( "id", DataType.LONG ).toColumn() ).addColumn( new
ColumnBuilder( "value", DataType.TEXT ).toColumn() ).toTable( db );

       long i = 0;
       for \( i = 0; i &lt; MAXROWS; i\+\+ \)
       \{
           table\.addRow\( i, &quot;some text&quot; \);
       \}

       log\.info\( &quot;Reading data&quot; \);
       int count = 0;
       while \( \( row = table\.getNextRow\(\) \) \!= null \)
       \{
           count\+\+;
           log\.debug\( row \);
           Integer id = \(Integer\) row\.get\( &quot;id&quot; \);
           String value = \(String\) row\.get\( &quot;value&quot; \);

           System\.err\.println\( id \+ &quot; : &quot; \+ value \);
       \}
       log\.info\( count \+ &quot; datavalues&quot; \);
   \} catch \( IOException ex \)
   \{
       log\.info\(&quot;Ouch: &quot;\+ex\);
   \}

}
}

Thanks.

Ime

--- On Fri, 7/23/10, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

From: Bob Jolliffe <bobjolliffe@gmail.com>
Subject: Re: [Dhis2-devs] dhis14 import
To: "Lars Helge Øverland" <larshelge@gmail.com>
Cc: "Ime Asangansi" <asangansi@yahoo.com>, "Knut Staring" <knutst@gmail.com>, "dhis2-devs" <dhis2-devs@lists.launchpad.net>
Date: Friday, July 23, 2010, 3:53 PM
Alright writing is pretty slow.
Writing 10million records take two hours.

But reading is pretty darn quick. Iterating through
10million rows
and dumping each row to stdout (redirected to /dev/null)
takes about 1
minute +- 5 seconds. Throughout which
memory use is constant and
low: less than 1.5%. That's got to be quicker than
anything using
jdbc which is obliged to push everything up and down the
tcp/ip stack.

Not looking any more at this now - just wanted to see the
numbers.
Looks like a good addition to Knut's student project
list. Refactor
dhis14 file import to inject a jackcess backend in place of
the
hibernate one.

Cheers
Bob

2010/7/23 Bob Jolliffe <bobjolliffe@gmail.com>:
> 2010/7/22 Lars Helge Øverland <larshelge@gmail.com>:
>>
>> No doubt this looks much simpler.
>> Would be interesting to do a test with a large
table (>10 mill) and see how
>> it performs in terms of memory usage.
>
> 10 million records is a lot of records! I have it
whirring away in
> the background as I get on with other stuff. Started
1.5 hours ago
> and still writing records ... db file up to 250M ...
hasn't started
> reading back yet but thus far memory usage is low and
constant. I'll
> let you know when/if it finishes :slight_smile:
>
>> Lars
>> On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi >>> <asangansi@yahoo.com> >>> wrote:
>>>
>>> Impressive!
>>> First time seeing that clean functionality!
>>> I see potential there to move data between
both systems :slight_smile:
>>>
>>> Ime
>>>
>>>
>>> --- On Tue, 7/20/10, Knut Staring <knutst@gmail.com> >>> wrote:
>>>
>>> > From: Knut Staring <knutst@gmail.com>
>>> > Subject: Re: [Dhis2-devs] dhis14 import
>>> > To: "Bob Jolliffe" <bobjolliffe@gmail.com>
>>> > Cc: "dhis2-devs" <dhis2-devs@lists.launchpad.net>
>>> > Date: Tuesday, July 20, 2010, 3:32 PM
>>> > That sounds really great - it has
>>> > been problematic to require Windows for
this.
>>> >
>>> > k
>>> >
>>> > On Tue, Jul 20, 2010 at 3:23 PM, Bob >>> Jolliffe <bobjolliffe@gmail.com> >>> >>> > wrote:
>>> > > Just some throwaway code testing out
jackcess for
>>> > reading dhis14 (and
>>> > > potentially modulo basico files):
>>> > >
>>> > > http://pastebin.com/wMv1SZqq
>>> > >
>>> > > I'm pretty impressed. It works
well and I suspect
>>> > also much faster
>>> > > than accessing via odbc/ibatis or
whatever it is.
>>> > Never mind the
>>> > > nonsense of what this code actually
does - the point
>>> > is that it can
>>> > > iterate over access tables using
java (on ubuntu).
>>> > Kind of nice.
>>> > >
>>> > > Cheers
>>> > > Bob
>>> > >
>>> > >
_______________________________________________
>>> > > Mailing list: https://launchpad.net/~dhis2-devs
>>> > > Post to : dhis2-devs@lists.launchpad.net
>>> > > Unsubscribe : https://launchpad.net/~dhis2-devs
>>> > > More help : https://help.launchpad.net/ListHelp
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Cheers,
>>> > Knut Staring
>>> >
>>> >
_______________________________________________
>>> > Mailing list: https://launchpad.net/~dhis2-devs
>>> > Post to : dhis2-devs@lists.launchpad.net
>>> > Unsubscribe : https://launchpad.net/~dhis2-devs
>>> > More help : https://help.launchpad.net/ListHelp
>>> >
>>>
>>>
_______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to : dhis2-devs@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help : https://help.launchpad.net/ListHelp
>>
>>
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+17069260025

Alright writing is pretty slow. Writing 10million records take two hours.

But reading is pretty darn quick. Iterating through 10million rows

and dumping each row to stdout (redirected to /dev/null) takes about 1

minute ± 5 seconds. Throughout which memory use is constant and

low: less than 1.5%. That’s got to be quicker than anything using

jdbc which is obliged to push everything up and down the tcp/ip stack.

Not looking any more at this now - just wanted to see the numbers.

Looks like a good addition to Knut’s student project list. Refactor

dhis14 file import to inject a jackcess backend in place of the

hibernate one.

Cheers

Bob

Thanks for testing this. Looks promising.

Actually what I was most curious about was how data is fetched from such large tables. By doing a “select * from DataValue” on a 10 mill table we will run out of memory one way or the other. Doing a “select * from DataValue order by DataValueID limit 0, 10000” and then increase the limit takes a long time due to the ordering. Ibatis (what we currenty use) has a RowHandler interface which deals with this and lets you handle each row individually with apparently quite effective memory management (I don’t know how this is implemented though.)

···

2010/7/23 Bob Jolliffe bobjolliffe@gmail.com

2010/7/23 Bob Jolliffe bobjolliffe@gmail.com:

2010/7/22 Lars Helge Øverland larshelge@gmail.com:

No doubt this looks much simpler.

Would be interesting to do a test with a large table (>10 mill) and see how

it performs in terms of memory usage.

10 million records is a lot of records! I have it whirring away in

the background as I get on with other stuff. Started 1.5 hours ago

and still writing records … db file up to 250M … hasn’t started

reading back yet but thus far memory usage is low and constant. I’ll

let you know when/if it finishes :slight_smile:

Lars

On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi asangansi@yahoo.com wrote:

Impressive!

First time seeing that clean functionality!

I see potential there to move data between both systems :slight_smile:

Ime

— On Tue, 7/20/10, Knut Staring knutst@gmail.com wrote:

From: Knut Staring knutst@gmail.com

Subject: Re: [Dhis2-devs] dhis14 import

To: “Bob Jolliffe” bobjolliffe@gmail.com

Cc: “dhis2-devs” dhis2-devs@lists.launchpad.net

Date: Tuesday, July 20, 2010, 3:32 PM

That sounds really great - it has

been problematic to require Windows for this.

k

On Tue, Jul 20, 2010 at 3:23 PM, Bob Jolliffe bobjolliffe@gmail.com > > >>> > wrote:

Just some throwaway code testing out jackcess for

reading dhis14 (and

potentially modulo basico files):

http://pastebin.com/wMv1SZqq

I’m pretty impressed. It works well and I suspect

also much faster

than accessing via odbc/ibatis or whatever it is.

Never mind the

nonsense of what this code actually does - the point

is that it can

iterate over access tables using java (on ubuntu).

Kind of nice.

Cheers

Bob


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Cheers,

Knut Staring


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp


Mailing list: https://launchpad.net/~dhis2-devs

Post to : dhis2-devs@lists.launchpad.net

Unsubscribe : https://launchpad.net/~dhis2-devs

More help : https://help.launchpad.net/ListHelp

Hi Jason

Hi Bob,

I think this is indeed interesting, but is it really worth spending
any time on? We currently have something that works, and for our 10
million row database in Zambia it took a few hours to import. OK,
fine, it took time, but it was a one off deal. Is there much to be
gained from a full refactor? I think the typical use case would be a
one time bulk import, which more or less works. There are still some
hiccups, but these are more a function of the differences in the
metadata models between 1.4 and 2.0.

IMHO, what does need to work, is bidirectional syncronization with XML
between the two systems. Although it is theoretically possible to
export data from 2.0 into 1.4, the last time i checked a few weeks
back, it does not work. This funtionality, for me at least, is much
much more important than simply being able to import DHIS 1.4 data
without windows (which I would need in the first place to create 1.4
data!).

Not to be a party pooper or anything, but just trying to give some perspective.

I do agree I don't think its a priority. Which is why I don't intend
to spend much time on it :slight_smile:

In general I'm not in a position to do much with dhis14 .. at least
now I do have a way of looking at dhis14 databases. And of course
there is the famous modulo basico access database.

But I think the biggest challenge to get right is synchronization
(data and metadata exchange, versioning, governance etc) rather than
bulk import. As I think you've mentioned elsewhere, this can always
be done by hook or by crook with the right tools and skills in hand.

So poop away .. fine by me :slight_smile:

Bob

···

On 24 July 2010 15:25, Jason Pickering <jason.p.pickering@gmail.com> wrote:

Regards,
Jason

On Fri, Jul 23, 2010 at 10:05 PM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

Hi Ime

On 23 July 2010 19:23, Ime Asangansi <asangansi@yahoo.com> wrote:

Hi Bob,

Mmmm..interesting.

Just to be sure.
Were you reading from 1.4 and writing to 2.0? mysql or postgres?

Were you writing the other way (to access)?

Nothing so grand as any of the above. Though I am reading and writing
access. Reading/writing mysql and postgres will be much more of a
bottleneck than reading jackcess 'coz of the tcp requirement of jdbc.

In this scenario I'm just running scratch code in the background to
keep my laptop warm while I try to write :slight_smile: See attached. jackcess
is very maven friendly.

mvn assembly:assembly
java -jar target/dhaccess-jar-with-dendencies.jar

Drink lots of coffee, take a walk, have lunch .... etc

Cheers.
Bob

package org.hisp.dhis;

import com.healthmarketscience.jackcess.ColumnBuilder;
import com.healthmarketscience.jackcess.DataType;
import java.io.File;
import java.io.IOException;
import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import com.healthmarketscience.jackcess.Database;
import com.healthmarketscience.jackcess.Table;
import com.healthmarketscience.jackcess.TableBuilder;

/**
* CPU warmer
*
*/
public class App
{

private static final Log log = LogFactory.getLog( App.class );

private static long MAXROWS = 10000000;

public static void main( String args )
{
try
{
File dbFile = new File( "/home/bobj/src/dhaccess/test.mdb" );
Database db = Database.create( dbFile );
log.info( "Opened " + dbFile.getName() );

       Map&lt;String, Object&gt; row = null;
       Table table = new TableBuilder\( &quot;test&quot; \)\.addColumn\( new

ColumnBuilder( "id", DataType.LONG ).toColumn() ).addColumn( new
ColumnBuilder( "value", DataType.TEXT ).toColumn() ).toTable( db );

       long i = 0;
       for \( i = 0; i &lt; MAXROWS; i\+\+ \)
       \{
           table\.addRow\( i, &quot;some text&quot; \);
       \}

       log\.info\( &quot;Reading data&quot; \);
       int count = 0;
       while \( \( row = table\.getNextRow\(\) \) \!= null \)
       \{
           count\+\+;
           log\.debug\( row \);
           Integer id = \(Integer\) row\.get\( &quot;id&quot; \);
           String value = \(String\) row\.get\( &quot;value&quot; \);

           System\.err\.println\( id \+ &quot; : &quot; \+ value \);
       \}
       log\.info\( count \+ &quot; datavalues&quot; \);
   \} catch \( IOException ex \)
   \{
       log\.info\(&quot;Ouch: &quot;\+ex\);
   \}

}
}

Thanks.

Ime

--- On Fri, 7/23/10, Bob Jolliffe <bobjolliffe@gmail.com> wrote:

From: Bob Jolliffe <bobjolliffe@gmail.com>
Subject: Re: [Dhis2-devs] dhis14 import
To: "Lars Helge Øverland" <larshelge@gmail.com>
Cc: "Ime Asangansi" <asangansi@yahoo.com>, "Knut Staring" <knutst@gmail.com>, "dhis2-devs" <dhis2-devs@lists.launchpad.net>
Date: Friday, July 23, 2010, 3:53 PM
Alright writing is pretty slow.
Writing 10million records take two hours.

But reading is pretty darn quick. Iterating through
10million rows
and dumping each row to stdout (redirected to /dev/null)
takes about 1
minute +- 5 seconds. Throughout which
memory use is constant and
low: less than 1.5%. That's got to be quicker than
anything using
jdbc which is obliged to push everything up and down the
tcp/ip stack.

Not looking any more at this now - just wanted to see the
numbers.
Looks like a good addition to Knut's student project
list. Refactor
dhis14 file import to inject a jackcess backend in place of
the
hibernate one.

Cheers
Bob

2010/7/23 Bob Jolliffe <bobjolliffe@gmail.com>:
> 2010/7/22 Lars Helge Øverland <larshelge@gmail.com>:
>>
>> No doubt this looks much simpler.
>> Would be interesting to do a test with a large
table (>10 mill) and see how
>> it performs in terms of memory usage.
>
> 10 million records is a lot of records! I have it
whirring away in
> the background as I get on with other stuff. Started
1.5 hours ago
> and still writing records ... db file up to 250M ...
hasn't started
> reading back yet but thus far memory usage is low and
constant. I'll
> let you know when/if it finishes :slight_smile:
>
>> Lars
>> On Wed, Jul 21, 2010 at 3:42 AM, Ime Asangansi >>>> <asangansi@yahoo.com> >>>> wrote:
>>>
>>> Impressive!
>>> First time seeing that clean functionality!
>>> I see potential there to move data between
both systems :slight_smile:
>>>
>>> Ime
>>>
>>>
>>> --- On Tue, 7/20/10, Knut Staring <knutst@gmail.com> >>>> wrote:
>>>
>>> > From: Knut Staring <knutst@gmail.com>
>>> > Subject: Re: [Dhis2-devs] dhis14 import
>>> > To: "Bob Jolliffe" <bobjolliffe@gmail.com>
>>> > Cc: "dhis2-devs" <dhis2-devs@lists.launchpad.net>
>>> > Date: Tuesday, July 20, 2010, 3:32 PM
>>> > That sounds really great - it has
>>> > been problematic to require Windows for
this.
>>> >
>>> > k
>>> >
>>> > On Tue, Jul 20, 2010 at 3:23 PM, Bob >>>> Jolliffe <bobjolliffe@gmail.com> >>>> >>> > wrote:
>>> > > Just some throwaway code testing out
jackcess for
>>> > reading dhis14 (and
>>> > > potentially modulo basico files):
>>> > >
>>> > > http://pastebin.com/wMv1SZqq
>>> > >
>>> > > I'm pretty impressed. It works
well and I suspect
>>> > also much faster
>>> > > than accessing via odbc/ibatis or
whatever it is.
>>> > Never mind the
>>> > > nonsense of what this code actually
does - the point
>>> > is that it can
>>> > > iterate over access tables using
java (on ubuntu).
>>> > Kind of nice.
>>> > >
>>> > > Cheers
>>> > > Bob
>>> > >
>>> > >
_______________________________________________
>>> > > Mailing list: https://launchpad.net/~dhis2-devs
>>> > > Post to : dhis2-devs@lists.launchpad.net
>>> > > Unsubscribe : https://launchpad.net/~dhis2-devs
>>> > > More help : https://help.launchpad.net/ListHelp
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Cheers,
>>> > Knut Staring
>>> >
>>> >
_______________________________________________
>>> > Mailing list: https://launchpad.net/~dhis2-devs
>>> > Post to : dhis2-devs@lists.launchpad.net
>>> > Unsubscribe : https://launchpad.net/~dhis2-devs
>>> > More help : https://help.launchpad.net/ListHelp
>>> >
>>>
>>>
_______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to : dhis2-devs@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help : https://help.launchpad.net/ListHelp
>>
>>
>

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help : https://help.launchpad.net/ListHelp

--
Jason P. Pickering
email: jason.p.pickering@gmail.com
tel:+17069260025