Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Follow the Yellow Brick Road - Thu, 20 Nov 2008

Now where will this path lead me?

Two posts ago, I thought to convert to using XML as my internal data structure, since it’s equivalent to GEDCOM and I had an XML library available. The library wasn’t bad, but when I looked closer at it, I saw it was effectively using a simple tree structure.

So then I thought to load the GEDCOM directly into a simple tree structure. I found a nice little InMemoryTree routine from a post on Experts-Exchange.

I implemented that and started customizing. It was a bit slow.

At this point, I decided to check out various Profilers and use them to help me speed up the code. I downloaded AQTime and ProDelphi.

With AQTime’s line-by-line timings, I decided it was worthwhile to see how much I could speed up the parsing of the input line.

I started pulling more and more extraneous code out of my input, parsing and data loading routines and …

Unbelievable.

Behold can now read a 25 MB GEDCOM file with 95000 names and load its GEDCOM-like internal database in 1.19 seconds. A 48 MB GEDCOM loads in 2.25 seconds. This is about 20 times faster than the current alpha version of Behold! It also gives approximately the same speed as GenViewer, which has the fastest GEDCOM loader I know of.

So this is a keeper. It evolved by accident. I originally thought XML would be an excellent way I could export the GEDCOM again. It evolved into a complete rewrite of Behold’s internals that will make it much zippier.

This is an excellent time to make these major changes. The next release will be the beta, so I will be able to get everyone’s help to test that all is okay and the reinstalled functionality is all still there.

Stay tuned. Much more TBA along these lines.

Unicode at Last! - Thu, 13 Nov 2008

With Unicode supposedly in place in Behold, I wanted a real example to test it. I found a small simple GEDCOM file with Hebrew text mixed through it. Hebrew is tough because not only did it have special characters, but Hebrew prints from right-to-left.

I found out a couple of things. First you need a Unicode font. Behold by default uses the Arial font, and the one that comes with Windows XP Service Pack 2 is Unicode enabled and has some special characters from a few languages. If your language doesn’t show up properly, you’ll have to use a different font with more characters in it, such as Arial Unicode MS. It has 50,377 characters versus the 1,674 in plain Arial. But I really don’t like the way the Western letters look in it. They’re somewhat misshapen and ugly. If you’re using a non-Western language on your computer, that shouldn’t be a problem, because you will have an appropriate font available. I’ll have to add in Behold the ability for you to select the font you want to use, and possibly see if I can get Behold to detect what font your machine uses by default and use that.

Then I found out is that you have to have a Language Pack on your Windows computer in order for a Right-to-Left language like Hebrew to show up correctly. If you don’t have that, the letters will be backwards.

But putting this all together and now Behold can display mixed fonts like this:

Behold with mixed fonts

Don’t worry if the font looks too small. Behold has an option to set it to a larger size.

Also in the image above, you can see how the reimplemented table lines look. The generations are much easier to see with them.

XML is GEDCOM; GEDCOM is XML - Mon, 10 Nov 2008

My attempts to speed up the core of Behold led me to lean towards using the DISQLite3 database package. The expectation was that it would perform better than the b-tree structure I developed in prehistoric times for handling the internal data. But I may have discovered something even better.

XML (Extensible Markup Language) is a new technology developed to transport and store data. It contains start <TAGS> that are closed by ending </TAGS>. Between the start and end tags is the value associated with the tag. Tags can be nested.

Hmmm. GEDCOM looks like this:

0 @I32@ INDI
1 NAME John /Smith/
1 BIRT
2 DATE 04 MAY 1824
2 PLAC Paris, France

It has tags, values, and numbers indicating a level of nesting.

But here’s some XML:

<INDI ID=”I32″>
<NAME>John /Smith/</NAME>
<BIRT>
<DATE>04 MAY 1824</DATE>
<PLAC>Paris, France</PLAC>
</BIRT>
</INDI>

The XML is exactly :-D the same thing, as the GEDCOM.

New versions of Delphi have built-in support of XML, including support of the DOM (Document Object Model) that defines a standard way for accessing and manipulating XML documents.

What this means for Behold is that I can get it to read in the GEDCOM as XML and store it internally in memory that way for easy access while Behold is running. That will allow me to “virtualize” the ElTree treeview to use that data, and possibly also virtualize RichView to use that data (although I’m not sure if the last one is feasible yet).

Then it will be an easy matter to save the XML back to GEDCOM again. This is a great way to implement saving of GEDCOMs. It will be easy enough to do that it will now be added into the next release, rather than having to wait until Version 1.5 as I had earlier planned.

Will XML be as fast as a database, or even as fast as my current b-tree data structure? I don’t know, but when looking around for a XML implementations that may be faster than Delphi’s, I was surprised to find that LMD has an XML library, and surprise of surprises, it’s included with the LMD ElPack package that I just upgraded to.

So this is the next step, and I’ll start to implement it after a good night’s sleep.