Louis Kessler’s Behold Blog

Raising the House - Wed, 5 Aug 2015

Installing a database into Behold is a major endeavour that I am doing carefully. It is, in many ways, not unlike the work that has been going on at a house I have been passing by for the past couple of months.

This is a beautiful home, about 100 years old, that supposedly was used for a movie set many years ago. But it had major foundation problems. The owner decided to replace the foundation.

If you look at the picture, you’ll see there are four huge steel beams holding the house up. I wish I knew how they got those steel beams there, but this was the first view of the house I had seen after the work started. It took me by complete surprise when I first saw it, and I just stood there for a while quite incredulous at the sight of it. If you look very closely in the black area under the house below the right side of the left window, very faintly you’ll see a white plastic chair. That chair is in the backyard behind the house.

With Behold, I’m removing its foundation: the internal data structures that hold all its genealogy data in memory while Behold is running, and I’m replacing that with a database on your hard drive that will hold the data. I’m ripping out the basement and replacing it. While I’m doing so, I’ve got to keep the rest of the house (Behold) safe and together, so that it can be laid back down on its new foundation.

It’s got to fit just right and I can’t break anything as I do this. There’s no room for any attempts to add something new, or make something better to the rest of the house. It’s just a matter of getting this most important job done and getting the foundation right.

Basically, I am taking each data structure: the INDI record, the FAM record, parent-children connections, places, sources, all the references between them, and one by one I’m replacing that part of the foundation. I hope that in a day or two, I’ll have enough done to replace a few parts of Behold’s old foundation with the new ones. I’ll see Behold work in a hybrid form, with some of its data now in the new database and the rest still from the old data structures.

I’ll then slowly convert the remaining data structures over bit by bit, keeping Behold running and working the whole time.

Once it appears that the entire new foundation is ready and is made to fit perfectly, I’ll lower the house onto it. Then I’ll carefully remove the four steel beams, um…, old data structures that are no longer needed, ensuring that everything continues to work as I dismantle the old support.

Quite a feat! But this should be a foundation that will serve the house at least another hundred years.

And it will provide the base that Behold will need to save your data in Version 2.

Why are there so many genealogy programs? - Wed, 29 Jul 2015

This post is total plagiarism of an article just written today by Matt Sherman on the StackExchange Blog titled “Why are there so many programming languages?”

The article struck me as also being totally true when applied to genealogy software, so I am reproducing it here and just changing a few words. Here it is:

—

I was asked this question twice in the past week, by colleagues who work with our clients. It sounds like a dumb question, but it’s not.

The short answer is that we don’t need so many genealogy programs, but we want them. Let’s explore this further.

Aren’t they all the same?

In a sense, yes. You can record your family information using Family Tree Maker, Legacy Family Tree, RootsMagic, Brother’s Keeper, Reunion or Family Historian. You can use Family Tree Builder or Gramps or Ancestral Quest or Heredis. Or WikiTree or Clooz or Evidentia.

Underlying this fact is that all of these genealogy programs serve the same purpose: to help you record, keep track of, and make sense of your data. In highfalutin genealogy terms, they all help you with your “Genealogy Workflow”.

At their most foundational level, these programs are all the same. But on the surface – where humans interact with them – they vary a lot. This is where other concerns come into play.

Different tools for different jobs

Genealogy programs are tools, and we choose different tools for different jobs. A tractor trailer and a bicycle and a Tesla are all vehicles – they have wheels and steering and will get you from point A to point B – but obviously we use them for different things.

Genealogy programs are similar. RootsMagic and Reunion are great for recording your facts; Family Tree Builder and WikiTree are often used to work with online family trees; Clooz and Evidentia are the tools of choice for analyzing your reasoning.

Genealogy programs often make trade-offs in terms of convenience, safety, and speed – much like vehicles. The trade-off is dictated by the job at hand.

Genealogists have tastes

Beyond mere utility, genealogists choose tools based on personal tastes.

A genealogy program is a tool for humans to express ideas to computers. While we genealogists have many things in common, there is natural variety in the way our minds work.

Because we have many choices of good genealogy programs, we can select one that “works the way I think”. Some genealogists like WikiTree’s flexibility, while others prefer RootsMagic’s strictness. Where some programs feel complex, others look like they were made for you.

People first

Beyond utility, and beyond taste, businesses run on people. Often, you will choose a genealogy program based on what you, or the people around you, know.

Many genealogists chose Family Tree Maker mostly because that’s what they found on store shelves. In turn, it’s what their friends and colleagues then used. Which made learning how to use the software easier, and allowed Family Tree Maker to get to dominate the market.

(Unfortunately, Family Tree Maker also happens to have terrible technical characteristics.)

Technologies are supported by “ecosystems” – communities and organizations that provide the tools and assistance that every genealogist needs. A good ecosystem – Family Historian has a great one, for example – can make the individual genealogist more successful.

Variety is strength

In summary, we have a variety of genealogy programs because there is a variety of jobs to be done and a variety of people who do those jobs. This diversity makes interesting programs – and interesting companies, and interesting research – possible.

By ~~Matt Sherman~~ Louis Kessler, ~~Engineering Manager~~ Genealogist Developer

Tagged in ~~engineering~~ software

—

For a list of over 900 different genealogy programs that you can rate and review and see other people’s ratings and reviews, visit GenSoftReviews.

2 Comments

Getting the Timing Right - Sun, 26 Jul 2015

One of my concerns about switching from in-memory data structures to an on-disk database has been with regards to speed. Memory is faster than disk. Writing to disk will slow things down.

But it better not slow things down too much. For normal-sized genealogy files (say up to 10,000 people), everything should be fast and smooth. Larger files of 100,000 or more people should degrade gracefully. But they should not take forever.

As I have learned through my implementation of Behold’s database during the past few weeks, writing databases to disk one record at a time is sloooooow.

In the past to get timings, I used to use AQtime, a third-party profiler that would count and time each line of your executable without adding much overhead at all. It worked like magic to me. But moving to Delphi XE8, I would need to upgrade to their new version. The cost plus a horrendous interface with the Delphi development environment just wasn’t worth it to me.

Instead I found a wonderful simple new function in Delphi called StopWatch that gives accurate timings. You can create as many as you want and you just add them around your code, e.g.:

Stopwatch1.Start;
Stopwatch2.Start;
code
Stopwatch2.Stop;
more code
Stopwatch2.Start;
even more code
Stopwatch2.Stop;
and more code
Stopwatch1.Stop;

Testing inside a routine that is called 10,000 times, I get a timing of 229 milliseconds (ms), and rerunning the routine doesn’t vary from that by more than about 5 ms, so it is very accurate and more than accurate enough to compare the speed of different implementations for optimization purposes.

So I used Stopwatches to see how quick I could get SQLite to be and determine how to best implement it. I used two files. In the first, I loaded 33,790 INDI (people) records from a GEDCOM file. In the second I loaded 198,522 INDI records.

What I was comparing to I knew was an unattainable goal. It was the B* tree I created and super-optimized to index and store the INDI records in memory in Behold. A B* tree is a balanced version of a B-tree (which is really a generalized binary tree but with n-nodes rather than just 2). For the two files, my B* tree took 53 ms and 459 ms.

Next I did a straight insert of the same information to a disk-based SQLite database. It took 1530 ms and 10985 ms. Well that’s quite a bit longer, and 10 seconds is a bit too long for someone to wait when it only took a half a second before.

The default way a database writes information is one record at a time. What was needed was to batch the transactions using a technique called Array DML. Once I did that, I was able to get times down to 176 ms and 1012 ms.

That was better, still not as good as my beautiful b*-tree, but acceptable when disk writes are involved.

Still something bothered me. The amount of time needed to load the DML array was half of the total time. This was a simple assignment of all the data, one by one, into the array. There was no database work going on. In the two tests, it was loading a 20 MB array and a 100 MB array and it was all in memory. That should be much faster than it is, and shouldn’t take as long as writing the data to disk.

I wasn’t sure what was going on. It was using a Delphi object known as a TCollection. Technically it was a collection of a collection of variants. So it was two dimensional, one for each INDI and one for the 16 fields for each INDI.

I played around a bit but was not able to find a way to load that TCollection any faster. I even posted a question on StackOverflow (check it out if you want to get into some technical details with more timings).

By the way. All the above timings were compiled to 64-bit. Remember when I blogged a few days ago that 64-bit was slower than 32-bit? Well that’s not true for the database. My B* tree is 25% slower in 64-bit, but the database writes are 15% faster despite using 35% more memory.

I’m going to do a bit more research into this, but I’m not going to let it side-track me. Tomorrow I’ll start the work to get Behold writing to and using the database.

Louis Kessler’s Behold Blog

Raising the House - Wed, 5 Aug 2015