Login to participate
Register   Lost ID/password?

Louis Kessler's Behold Blog

Why are there so many genealogy programs? - 3 days, 12 hrs ago

This post is total plagiarism of an article just written today by Matt Sherman on the StackExchange Blog titled “Why are there so many programming languages?” 

The article struck me as also being totally true when applied to genealogy software, so I am reproducing it here and just changing a few words. Here it is:

I was asked this question twice in the past week, by colleagues who work with our clients. It sounds like a dumb question, but it’s not.

The short answer is that we don’t need so many genealogy programs, but we want them. Let’s explore this further.

Aren’t they all the same?

In a sense, yes. You can record your family information using Family Tree Maker, Legacy Family Tree, RootsMagic, Brother’s Keeper, Reunion or Family Historian. You can use Family Tree Builder or Gramps or Ancestral Quest or Heredis. Or WikiTree or Clooz or Evidentia.

Underlying this fact is that all of these genealogy programs serve the same purpose: to help you record, keep track of, and make sense of your data. In highfalutin genealogy terms, they all help you with yourGenealogy Workflow”.

Genealogy Workflow, by Joe Martel, Ron Tanner, Grant Skousen

At their most foundational level, these programs are all the same. But on the surface – where humans interact with them – they vary a lot. This is where other concerns come into play.

Different tools for different jobs

Genealogy programs are tools, and we choose different tools for different jobs. A tractor trailer and a bicycle and a Tesla are all vehicles – they have wheels and steering and will get you from point A to point B – but obviously we use them for different things.

Genealogy programs are similar. RootsMagic and Reunion are great for recording your facts; Family Tree Builder and WikiTree are often used to work with online family trees; Clooz and Evidentia are the tools of choice for analyzing your reasoning.

Genealogy programs often make trade-offs in terms of convenience, safety, and speed – much like vehicles. The trade-off is dictated by the job at hand.

Genealogists have tastes

Beyond mere utility, genealogists choose tools based on personal tastes.

A genealogy program is a tool for humans to express ideas to computers. While we genealogists have many things in common, there is natural variety in the way our minds work.

Because we have many choices of good genealogy programs, we can select one that “works the way I think”. Some genealogists like WikiTree’s flexibility, while others prefer RootsMagic’s strictness. Where some programs feel complex, others look like they were made for you.

People first

Beyond utility, and beyond taste, businesses run on people. Often, you will choose a genealogy program based on what you, or the people around you, know.

Many genealogists chose Family Tree Maker mostly because that’s what they found on store shelves. In turn, it’s what their friends and colleagues then used. Which made learning how to use the software easier, and allowed Family Tree Maker to get to dominate the market.

(Unfortunately, Family Tree Maker also happens to have terrible technical characteristics.)

Technologies are supported by “ecosystems” – communities and organizations that provide the tools and assistance that every genealogist needs. A good ecosystem – Family Historian has a great one, for example – can make the individual genealogist more successful.

Variety is strength

In summary, we have a variety of genealogy programs because there is a variety of jobs to be done and a variety of people who do those jobs. This diversity makes interesting programs – and interesting companies, and interesting research – possible.

By Matt Sherman Louis Kessler, Engineering Manager Genealogist Developer

Tagged in engineering software

For a list of over 900 different genealogy programs that you can rate and review and see other people’s ratings and reviews, visit GenSoftReviews.

Getting the Timing Right - 6 days, 23 hrs ago

One of my concerns about switching from in-memory data structures to an on-disk database has been with regards to speed. Memory is faster than disk. Writing to disk will slow things down.

But it better not slow things down too much. For normal-sized genealogy files (say up to 10,000 people), everything should be fast and smooth. Larger files of 100,000 or more people should degrade gracefully. But they should not take forever.

As I have learned through my implementation of Behold’s database during the past few weeks, writing databases to disk one record at a time is sloooooow.

In the past to get timings, I used to use AQtime, a third-party profiler that would count and time each line of your executable without adding much overhead at all. It worked like magic to me. But moving to Delphi XE8, I would need to upgrade to their new version. The cost plus a horrendous interface with the Delphi development environment just wasn’t worth it to me.

Instead I found a wonderful simple new function in Delphi called StopWatch that gives accurate timings. You can create as many as you want and you just add them around your code, e.g.:

more code
even more code
and more code

imageTesting inside a routine that is called 10,000 times, I get a timing of 229 milliseconds (ms), and rerunning the routine doesn’t vary from that by more than about 5 ms, so it is very accurate and more than accurate enough to compare the speed of different implementations for optimization purposes.

So I used Stopwatches to see how quick I could get SQLite to be and determine how to best implement it. I used two files. In the first, I loaded 33,790 INDI (people) records from a GEDCOM file. In the second I loaded 198,522 INDI records.

What I was comparing to I knew was an unattainable goal. It was the B* tree I created and super-optimized to index and store the INDI records in memory in Behold. A B* tree is a balanced version of a B-tree (which is really a generalized binary tree but with n-nodes rather than just 2). For the two files, my B* tree took 53 ms and 459 ms.

Next I did a straight insert of the same information to a disk-based SQLite database. It took 1530 ms and 10985 ms. Well that’s quite a bit longer, and 10 seconds is a bit too long for someone to wait when it only took a half a second before.

The default way a database writes information is one record at a time. What was needed was to batch the transactions using a technique called Array DML. Once I did that, I was able to get times down to 176 ms and 1012 ms.

That was better, still not as good as my beautiful b*-tree, but acceptable when disk writes are involved.

Still something bothered me. The amount of time needed to load the DML array was half of the total time. This was a simple assignment of all the data, one by one, into the array. There was no database work going on. In the two tests, it was loading a 20 MB array and a 100 MB array and it was all in memory. That should be much faster than it is, and shouldn’t take as long as writing the data to disk.

I wasn’t sure what was going on. It was using a Delphi object known as a TCollection. Technically it was a collection of a collection of variants. So it was two dimensional, one for each INDI and one for the 16 fields for each INDI.

I played around a bit but was not able to find a way to load that TCollection any faster. I even posted a question on StackOverflow (check it out if you want to get into some technical details with more timings).

By the way. All the above timings were compiled to 64-bit. Remember when I blogged a few days ago that 64-bit was slower than 32-bit? Well that’s not true for the database.  My B* tree is 25% slower in 64-bit, but the database writes are 15% faster despite using 35% more memory.

I’m going to do a bit more research into this, but I’m not going to let it side-track me. Tomorrow I’ll start the work to get Behold writing to and using the database.

Blog and Forum Was Down For A Bit - Thu, 23 Jul 2015

I noticed a problem in the Behold Forum last night where a post was not always showing up. Last night I attempted to fix it with somewhat disastrous results.

Unbeknownst to me, I crashed both the Forum and Blog last night at 11:45 p.m. and it took me a few hours today to bring them back up today.

I still don’t know exactly what happened, but I’m going to do some investigation and will attempt to ensure everything is working.

So, hopefully not, but there may be a few short periods of disruption with the blog and forum over the next couple of days.


Update 8 hours later:  I’ve cleaned everything up and the Blog and Forum should both be running nicely now.

Bonus: I found and fixed a problem where searches for exact strings with double quotes will now paginate properly. This had been bugging me for years and I finally found what it was and fixed it. PHP evaluates the string within double quotes, and when the search term is a double quoted string, PHP eats it up. The fix was to enclose the query string with an htmlspecialchars function to prevent the evaluation. GenSoftReviews searches had the same problem, and I’ve fixed it there as well.

PHP “evaluates” doubly quoted strings. It doesn’t evaluate single quoted strings. Another unexpected inconsistent feature. This is yet another example of why PHP is the world’s worst programming language.

PHP is a tool like this one - from Jeff Atwood's article