Login to participate
  
       
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Benchmarks - Tue, 25 Mar 2014

One thing I’ve tried to do with Behold is optimize its speed. I want it to load your GEDCOM and display the Everything Report for you as fast as possible. I’ve worked hard to do this and have made it a program that is notable for its loading speed.

Every version prior to release, I check both the speed and memory use of that version of Behold to ensure any changes I’ve made have not degraded it significantly. Sometimes, when I add something major to Behold, it may slow down and use more memory. If I think that’s significant, I’ll do some program profiling with a program called AQTime that shows me which lines in Behold are taking a lot of time, and I’ll do some work to optimize and speed it up a bit.

I have several test files I use for benchmarking the speed and memory. One is a large GEDCOM from a friend of mine. Let’s call it the Lynn file. This file has 1.4 million lines, is 24 MB in size, includes 94,000 people and 33,000 families.

The first benchmarks I had for the Lynn file were for version 0.99.2 of Behold, when it took 11 seconds to load and used 265 MB of RAM.  Since then, I’ve added functionality to Behold and the time has reduced for version 1.0.5.1  to 8 seconds and memory to 164 MB.

The ratio of the RAM used to the GEDCOM file size for the Lynn file is 164 MB to 24 MB which is a 7 to 1 ratio. I know that’s way too much. One of the reasons is that Behold loads everything into memory. Behold does not yet use a disk-based database. I’ll be adding that in Version 1.5 in preparation for editing, since Behold will need its own native format to save your data. Once I am doing that, I’ll optimize Behold’s memory use and it should then not use more than a 1 to 1 ratio of its GEDCOM file size.

The second file I use is a huge one known as Good-Engle-Hanks (lets call it GEH). The copy of it that I have is a monster that is 333 MB in size, with 742,000 individuals and 267,000 families. This file challenged Behold. Version 1.0.1 loaded it and displayed the Everything Report after 94 seconds. But it used 1618 MB of RAM. Now that value of 1618 MB is near the limit of what any 32-bit program is capable of in a Windows Operating System. In Version 1.0.2, I added some new features to Behold that added to the memory use, and since then GEH causes Behold to run out of memory just before completing. That’s disappointing. But I do expect that when I get to version 1.5 and reduce the RAM to filesize ratio, Behold should once again handle GEH.

If youre interested to see how your genealogy program handles a huge file, I’m afraid that when I just looked to give you the link, I saw that Mr. Beeston took down his Good-Engle-Hanks GEDCOM from his Gedcoms page. His other files aren’t nearly as large. So might I instead suggest an even larger one that I sometimes test with which happens to be a very interesting file. It is the 2010 Catalog of Life Database converted to GEDCOM and contains 1,257,735 species that work out to 2.1 million individuals. The GEDCOM file is over 650 MB in size. You can get it at Paul Pruitt’s Famous Family Trees site. If you’re curious what program created this huge file, well it was Personal Ancestral File (PAF).

The last file I test with is actually a set of files created by Tamura Jones’ GedFan utility. This utility generates a number of GEDCOM files where each one has double the number of individuals that the previous had. By the time you’re up to the fan value 15, you’ve got over 32,000 individuals in it. The GEDCOM really has nothing but individuals who are connected to their mothers and fathers with very little data. But it is an excellent test of program capability. A fan value 16 file is twice as large as a fan value 15 file. If your program can handle one fan value, then try one higher. Eventually it must fail.

Behold currently can only handle fan value 19, which is about a half a million individuals. It is again the memory limit which Behold runs into. I expect that when I reduce the memory requirements in version 1.5, I’ll be able to increase Behold’s fan value somewhat. And then when I build the 64-bit version of Behold, that should increase Behold’s fan value even more.

The times I gave you above, e.g. 6 seconds for Behold to load my Lynn file, are very much dependent on the computer that Behold is run on. A slower computer will take more time. A faster computer will take less time. So I was wondering how much faster my new development computer was than my old one.

My old machine had an AMD Phenom 9650 Quad-Core with 4 GB RAM running 64-bit Vista. It was five years old. My new machine has an Intel i7-4770 Quad-Core with 12 GB RAM running 64-bit Windows 8.1. Running Behold with my Lynn file, it now took only 2.5 seconds. That was over 3 times faster than my old computer. Similarly, the fan 19 file also was over 3 times faster, reducing from 38 seconds down to 11.5 seconds. The memory used on both machines was, of course, the same, since memory is handled almost the same way between Vista and Windows 8.1. Now I have new benchmarks to compare the next versions of Behold to.

I’m almost through converting over to my new machine. It generally takes me about 2 weeks to get my web development and programming environment all set up again. I’m looking forward to the boost in productivity the new computer will offer me. That should help me achieve my resolutions to get the next few versions of Behold out this year.

3 Comments           comments Leave a Comment

1. deckie49 (deckie49)
United States flag
Joined: Fri, 20 Jul 2012
7 blog comments, 0 forum posts
Posted: Wed, 26 Mar 2014  Permalink

Louis,
What practical benefit would be gained by increasing “fan value”? To allow enough memory for more than a half million names?? I’ve been doing serious genealogy for 34 years and I only come up a little shy of 5000 names.

2. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
135 blog comments, 199 forum posts
Posted: Thu, 27 Mar 2014  Permalink

The Lynn file is a real file with over 90,000 people in it. This size can often happen when people merge other people’s research into theirs (something I don’t recommend). However, you can also achieve such sizes by loading multiple GEDCOMs at once, as Behold allows (but few others do). Multiple files will play together better once I implement Virtual merging.

Practically, I’m thinking in terms of ultimately being able to load all relatives from a large tree such as FamilySearch or Ancestry, or maybe to search from the tree and download all sources with a certain criteria. People who do one name or one place studies often load tens of thousands of people at once.

From a novel point of view, I would get pleasure out of winning Tamura Jones’ Confucius Challenge.

3. deckie49 (deckie49)
United States flag
Joined: Fri, 20 Jul 2012
7 blog comments, 0 forum posts
Posted: Fri, 28 Mar 2014  Permalink

Thanks for the info. I can certainly see how it could happen under those circumstances!

Leave a Comment

You must login to comment.

Login to participate
  
       
Register   Lost ID/password?