Genealogy and Programming: Both Challenging and Fun - Mon, 27 Oct 2014
I’m very lucky, being both a genealogist and a programmer programming genealogy. The two tasks are similar in many respects. You run into problems that are difficult to solve, you need to prove if something is correct or why it isn’t, and some days you make lots of progress whereas other days… well, you can get frustrated, or you can go optimistically forward.
If you’ve been following my Twitter account, you’ll have seen that since I got back from Gaenovium, I’ve committed myself (with the encouragement of a writer friend who is doing the same) to be able to say that I #amprogramming (or for my friend #amwriting) every single day hopefully for 365 days straight. This is an excellent motivational technique, and once you’re on a roll you really get rolling! You see the progress and you look forward to the next day’s work, without having forgotten what you have done before. It’s not full time, but a minimum of an hour or two per day of committed effort. I’m up to day 15 and still going strong.
Coming from a statistics/mathematics/computer science background, some of my most challenging and fun problems in programming are dealing with data structures. It’s not often I talk about really technical programming details in this blog, but this one seems just right, so please bear with me.
Behold reads files written in GEDCOM format. Many years ago, too long ago to remember, I programmed a linked list to represent the family/individual connections in Behold. I created a data structure for this that I call my IndiFam list. Every “family” (father/mother) is connected to the oldest child, who is connected to the next oldest child, until you finally reach the youngest child. I had programmed this oldest to youngest scheme because GEDCOM stated that “the preferred order of the CHILdren pointers within a FAMily structure is chronological by birth.” Most developers followed this suggestion. So little did I expect that I’d have to address this very basic bit of programming years after I first programmed it.
Since the recent surprising discovery of GEDCOM 2.0 files, I’ve run across several dozen of these files to play with. I wanted to make sure Behold could read them properly. Some of these are especially challenging for a genealogy program to read. So on day 13 of #amprogramming, I took to this task.
Most of the programming of GEDCOM 2.0 was straightforward. But there was a real challenge in the GEDCOM 2.0 method of linking children.
In GEDCOM 2.0, the family linked to the youngest child:
0 @1@ FAMI
2 @5@ YOUN
and then each individual linked to their older sibling:
0 @5@ INDI
2 OLD @6@
The developers of the standard (the Family History Department of the LDS) realized this was a problem for programmers and for GEDCOM versions 2.1 and later, changed it to simply list the children in order oldest to youngest within the FAMily record, like this:
0 @F1@ FAM
1 CHIL @6@
1 CHIL @5@
But I was still left with a challenge if I wanted Behold to read these GEDCOM 2.0 files.
Solving this sort of programming puzzle is never obvious. It often takes one or two stabs to first remember what I had put into my original data structures so many years ago, and then a few more attempts to figure out how to implement the change.
After several code writes, I had one that likely could have worked, but it would have been a mess of complicated code difficult to maintain or understand. That didn’t sit too well with me, and my mind started mulling over it. While getting ready for bed, I came up with what seemed to be a logical understandable method and ran to my office and scrawled the following diagram:
Okay, so maybe that’s logical and understandable to me, but not you. Never-the-less, using this I was able to produce some mighty clean code that I had working after less than an hour of coding and testing.
I don’t think I’ve ever shown any of the actual code from Behold in my blog before, but it’s an interesting illustration of what programmer-speak is and what the Object Pascal language used in Delphi looks like:
I won’t subject you to what all that means. All-in-all, Behold calls this routine once for every set of INDI.SIBL.OLD tags in a GEDCOM 2.0 file, and the code will reorder the siblings in the linked list structure of children.
Maybe that’s not exciting to you, but for a programmer, getting a “eureka” with a few scrawls on a piece of paper and coding it up in less than an hour is like finding that clue that finally identifies who your great-great-grandmother was.
I’ve still got a few things left in the bucket to get Version 1.1 of Behold out. But with my newfound motivation to ensure I #amprogramming every day, each day will result in a bit of progress with the goal of allowing me to release 1.1 sometime in November.
Now this blog post is out and I’ve got to get back to #amprogramming.
Have you done your #amfamilyresearching for today?