Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Success! - Mon, 24 Nov 2008

I was able to get Behold to load the 317 MB GEDCOM file! Instead of loading in each line, I loaded each level 0 structure as a line.

Fewer lines meant less accumulation of the internal Delphi overhead for a line (made up of pointers, counts, etc.)

The bottom line is that it loads in about 900 MB of RAM, which is good considering that it doubles to 634 MB just from changing into Unicode.

It also takes 10.4 seconds to load, which is excellent. I still have a bit of processing and indexing to do, so that will increase a bit. But it still gives me lots of room to get to my objective of a million people loaded in 30 seconds.

So now that this is done, I can complete the implementation of this new internal data structure, and the vast speed improvement of Behold should help everyone, even if you don’t have a million people in your tree.

Reaching the Limit - Sat, 22 Nov 2008

I tried loading a 317 MB GEDCOM file. (Go to: http://www.prpletr.com/Gedcoms.htm and download Good, Engle, Hanks Family Gedcom). Ran out of memory. It happened during the ANSI to Unicode conversion. So I dissected that routine and did the ANSI to Unicode conversion, 4 MB at a time. This now enabled it to complete the read in and conversion and it took only 11 seconds.

But it then ran out of memory trying to load it into my data structure. Well, 320 MB of ANSI turns into 640 MB of Unicode. With a 2 GB address space, I should still be able to load the 640 MB of character string into my structures.

I’m still not sure why, but Delphi 2009 uses over 6 times the original size of the ANSI file to store its character strings. Doubling is expected because Unicode are 2-byte characters, but six times seems excessive. I’ve put up a question on this at StackOverflow and asked a few experts as well.

Barring some answer that would cut this overhead down, I’ve reached the physical limit of what Behold can handle in memory with 32-bit processing. I am a bit disappointed, since GenViewer loads this huge file in 11 seconds and only uses about 340 MB of memory in doing so. I was hoping I could get Behold to do so as well.

I’m still going to try file mapping to memory, so I might not even have to load the file into memory. This may turn out to be too slow, especially since instead of just once, Delphi will have to convert the data to Unicode every time it accesses it. But it’s worth a shot.

Even so, the physical number of 2 to the power of 31 is closing in. So memory mapping might be able to get just a bit further with that limit being the next one in the way. And that will have to wait until Delphi’s 64-bit compiler is ready.

p.s. If you want some amusement, try that huge GEDCOM file on your favorite genealogy software, and see if it can handle it. Even try loading it into a text editor. Even try downloading it!

Follow the Yellow Brick Road - Thu, 20 Nov 2008

Now where will this path lead me?

Two posts ago, I thought to convert to using XML as my internal data structure, since it’s equivalent to GEDCOM and I had an XML library available. The library wasn’t bad, but when I looked closer at it, I saw it was effectively using a simple tree structure.

So then I thought to load the GEDCOM directly into a simple tree structure. I found a nice little InMemoryTree routine from a post on Experts-Exchange.

I implemented that and started customizing. It was a bit slow.

At this point, I decided to check out various Profilers and use them to help me speed up the code. I downloaded AQTime and ProDelphi.

With AQTime’s line-by-line timings, I decided it was worthwhile to see how much I could speed up the parsing of the input line.

I started pulling more and more extraneous code out of my input, parsing and data loading routines and …

Unbelievable.

Behold can now read a 25 MB GEDCOM file with 95000 names and load its GEDCOM-like internal database in 1.19 seconds. A 48 MB GEDCOM loads in 2.25 seconds. This is about 20 times faster than the current alpha version of Behold! It also gives approximately the same speed as GenViewer, which has the fastest GEDCOM loader I know of.

So this is a keeper. It evolved by accident. I originally thought XML would be an excellent way I could export the GEDCOM again. It evolved into a complete rewrite of Behold’s internals that will make it much zippier.

This is an excellent time to make these major changes. The next release will be the beta, so I will be able to get everyone’s help to test that all is okay and the reinstalled functionality is all still there.

Stay tuned. Much more TBA along these lines.