Login to participate
  
Register   Lost ID/password?

Louis Kessler's Behold Blog

Always Learning New Programming Concepts and Methods - Mon, 20 Jul 2015

Most of the new concepts and tools for programming didn’t exist when I graduated from University with a Masters of Computer Science in 1980. The programming languages I worked with were FORTRAN, APL, PL/I and even Assembler. I picked up Pascal on the job and that led me to Turbo Pascal for personal use and then to Delphi with its object-oriented version of Pascal.

Delphi’s object-orientation introduced in 1995 was to me a totally new concept I had to pick up. I never really needed to learn it in detail since I never wrote low level objects other programmers would consume, but I had to learn enough about it to consumer other programmer’s objects from the packages they wrote.

Also during the 1990’s was the development of Unicode, which finally made every language available in one character set. Windows started supporting it around 2000 in Windows NT and XP. I have several books on it in my library (e.g. Unicode: A Primer) attesting to the time I spent learning about it. Unicode was a necessity for genealogy software and I had to buy 3rd party packages to support it in Behold until Delphi 2009 came out which finally included it.

Now with Delphi XE8, Generics have really taken hold. One of my specialties has always been in data structures and algorithms. Sometimes I just can’t help myself as I really enjoy getting down and dirty with a good algorithm, just as I did a couple of years ago for Ancestral Loops. But now with Generics, many of these methods have become much easier and with them, I will be able to replace my custom-developed b-tree and the 3rd party hash tables that I had been using in Behold.

As far as databases go, they were in their infancy when I was at University. I do recall taking a course on normalizing relational databases, which is to analyze data and to organize it into its primary records based on the common keys that define specific fields. Years later, I refreshed that knowledge when having to do that normalization for work-related projects.

I had the opportunity to work and gain expertise with a hierarchical database system called FOCUS by Information Builders. FOCUS was around before Oracle and Microsoft Access became the big fish. I worked a bit with Access, but had samplings of Oracle and even a bit of the SAP Data Warehouse.

My personal website requirements led me to WordPress, the PHP language, its MySQL database and the six months of my life I spent customizing my blog and forum. If you lived within a few blocks of me during that time, you may have heard me often spouting expletive deleteds due to my continued frustrations with PHP (which I agree is the second worst programming language in the world – the worst one is obvious if you remember Job Control Language (JCL). Do watch this great talk on The Worst Programming Language Ever by Mark Rendle)

But now I’ve got two technologies to take on that are new for me:

1: The SQLite database was first designed in 2000. Since then, it has become the most widely deployed and used database engine in the world. Billions of copies exist because it is in just about everything you use. One major genealogy software package uses it, that being RootsMagic. And it is my choice for Behold’s database, primarily due to its speed, size and universality.

SQLite

2: Delphi’s new FireDAC framework. This allows high-level and universal access to SQLite or any database. It is like the middleware that gets Delphi to play nice with SQLite.

image

I’m learning these as we speak. A couple of months intensive work of implementation and I should gain a fairly extensive knowledge in both.

After that, I’ll have more learning coming, including JSON, API’s, Windows 10,…

So don’t ever think any programmer knows everything about programming. Knowledge grows old fast. And new technologies rise faster than Spring flowers.

This continual learning is what makes programming so much of a challenge … and so much fun.

I Know Where I’m Going, But … - Fri, 17 Jul 2015

… But I don’t know what it’s going to be like when I get there.

My adventure towards editing in Version 2 of Behold is a bit like the New Horizon’s mission to Pluto. They set off 9 years ago and knew where they were going. They knew how they’d get there. They even knew what they were going to do when they got there. But they didn’t know what they’d find when they got there.

image

So Behold is well on its way to getting its own database so that it will have a place to save the edits you add to it. I know that this editing is going to be something special. I just don’t know exactly what it will look like yet. My picture of it is about the same as the image of Pluto above, which was the best picture we had of it until last week.

As I get closer to the goal, ideas will expand and the picture will get clearer.

image

Even my last blog post, just 62 hours ago, talked about setting up a main database table that would generalize all the tables. Well, two days of working on the tables led to realizations that a generalized table will not work out, and the standard method of normalizing tables for INDI, FAM, SOUR, OBJ, etc., will do after all.

Also the realization that I must change just one thing at a time, ensure it can work, and then get it working before going on to the next is essential. Behold must always be a working system. At this point, wholesale changes will not work. The mission is set. Pluto is in our sites. Let’s get there.

NASA's Three-Billion-Mile Journey to Pluto Reaches Historic Encounter

Pluto nearly fills the frame in this image from the Long Range Reconnaissance Imager (LORRI) aboard NASA’s New Horizons spacecraft, taken on July 13, 2015 when the spacecraft was 476,000 miles (768,000 kilometers) from the surface. (Photo: NASA/APL/SwRI)

Database Design for Genealogy Data - Tue, 14 Jul 2015

There are several considerations as I structure the disk-based database for Behold. I want it to be capable of loading genealogy data from any source, I want it to be flexible, but most importantly I need it to handle the in-memory structures already in Behold. Let’s go through these considerations:

Data from any Source

Behold already reads GEDCOM. GEDCOM is the long-time standard used for transmission of genealogy data. Almost all genealogy programs can read and write from and to GEDCOM files to at least some degree. Most programs don’t make use of GEDCOM’s capabilities and some many mistakes writing to it.

GEDCOM’s file structure includes records for a Person, Family, Note, Source, Repository, Multimedia, Submitter and Submission. Notably absent are records for a Place and an Event, which I’ll need for Behold. GEDCOM does allow custom tags, and many vendors have included custom level zero records in Behold. That means just about anything goes – except that no other programs will read one program’s custom tags except in the rare cases that a program will go through the work to support another program’s custom tags.

Family Historian is one of the few genealogy programs that uses GEDCOM as its file format. Internally, it likely loads everything into a more efficient in-memory data structure like Behold does, but it only writes its data to a file as GEDCOM. And if I’m not mistaken, Family Historian still uses GEDCOM 5.5, which was superseded by GEDCOM 5.5.1 which PAF used and is the de facto standard.

There was an Event-Oriented GEDCOM developed by COMMSOFT and used in Ultimate Family Tree until Ancestry purchased it and discontinued it. They put the Event record front and centre in that structure.

There are some extensions to GEDCOM, most notably GEDCOM 5.5EL developed and supported by a group of genealogy program authors that has added a Place record and some other goodies.

The FHISO organization is working to produce an updated standard. That is probably a long time off, but when they do, that will be a new standard I’ll want to read.

Other programs: I’m not saying I’m going to get Behold to read from every other program out there, but it is very good to look at any published data file documentation that other programs have. I at least have to make sure that Behold will have the structures in place that it can store and display any relevant genealogy data that any program may have. Looking at other program’s file structures may give me other good ideas on ways to design Behold’s file.

Some of the programs with the most interesting data files I’ve looked at include:

  • RootsMagic – the database is SQLite, which is what I’ll be using. There are many SQLite viewers so their database can be easily inspected. In fact, there’s a SQLite Tools for RootsMagic group at wikispaces that publishes the RootsMagic data structure, provides tools, and helps people write their own custom queries from the RootsMagic data file.

    RootsMagic’s data structure includes standard records for: Person, Family, Place, Event, Source, Citation and Multimedia. There are some tables for subrecords (e.g. Name, Address). The most interesting records in RootsMagic which Behold should be capable of supporting include Groups, Research/To-Do, Source Templates, Exclusions (problem list items that are not a problem) and Witnesses.

  • Gramps – provides a full API (Application Programming Interface) to the Gramps database. The main records are Person, Family, Place, Event, Source, Citation, Repository, Media Object and Note.

    I think the Gramps people did a superb job of setting up a well designed layout with just the right fields in each record. I particularly like the idea of including on each record, fields for:  private, change, and a list of links to all the connections (rather than making them separate tables as most programs do). If I was starting anew and wanted to just “borrow” a database structure, this might be the one I would pick.

  • The Master Genealogist (TMG) – Bob Velke discontinued TMG last year. TMG was considered to be one of the more advanced genealogy programs when it was first created, and had to have data structures to support them. In August last year, the file structures of TMG 9 were published to allow other vendors to build import utilities that so that TMG’s users could migrate to a supported program. Lee Hoffman put up a nice summary of the TMG File Structures up to TMG 7. 

    Some of TMG’s more unique records include: Custom Flags, DNA information, Focus Groups, Participants/Witnesses and Timeline Locks.

Then there are the online Family Tree Services who you subscribe to and maintain your family tree on their websites. Most of these sites provide APIs to allow programmers to read and in some cases write from and to their service.The APIs will give an indication as to the structure of their data. Some of the most notable include:

  • MyHeritage – They have complete documentation online for their API, which they call the Family Graph API (surprisingly still with Beta designation on it). I heard Uri Gonen talk about it at RootsTech 2014. I tweeted then that I was very impressed and that MyHeritage is doing a lot of things right. Of all the APIs I’ve seen, theirs is the most straightforward and looks like it might be the simplest to implement.

    Some of the unique objects in their API include information related to their SmartMatch system, including a Matching Request, MatchesCount and ConfirmationStatus.

  • FamilySearch – They also have their documentation available online for developers. They also have their GEDCOM X Data Objects defined which describe the actual data structure used in their Family Tree. 

    Of most interest here is their inclusion of Discussions and their concept of Memories.

There are other online services with APIs, including Geni, Genealogie Online, WikiTree, GenealogyCloud and the DNA testing company 23andMe to name a few.

I should add that I really like Tim Forsythe’s online Gigatrees service. He doesn’t give a complete description but does at least map out his data structure he calls GREnDL. He attempts to do the correct thing and be source-based, so at the center of his model, he has the “Claim” record.

There are others who have developed data structures with different ideas that I keep aware of, like Tony Proctor (STEMMA), and Tom Wetmore (DeadEnds).


Flexibility

Behold is a flexible GEDCOM reader. It knows the rules of GEDCOM but reads in and can display anything that has the GEDCOM level/tag/value structure, even if the input doesn’t follow GEDCOM’s rules.

So yes, there is valid data and there is invalid data. But whatever is input must be preserved in some way. It cannot be thrown away.

The programs that convert data they don’t recognize into notes are sometimes criticized because they lose understanding of the data. But that is much better than not inputting the data at all. In the former case, the data can at least be exported again, but in the latter case the data is lost forever.

What must be done is to try to recognize all GEDCOM constructs and record the data as intended. If the construct as input is illegal, then maybe it can be translated to a legal construct. If not, maybe a custom tag can be used. If not, then turn it into a note.

Whatever is done, the data structures must be able to handle this flexibility. A very rigid structure requiring only certain types of records and fields and not allowing unknown records or unknown fields will have no place to put this data.

GEDCOM’s level/tag/value structure is actually very flexible and can handle just about anything. I plan to make use of it to contain the detail information about each record in the database.

Behold’s In-Memory Structures

Behold is already quite well set up to be converted to a file-based database. Its current structures include all the GEDCOM records and that already includes a place record and allows for custom records.

When I added Life Events into version 1.1, events suddenly took on new meaning. They were now not only linked to the individual or family, their place and their sources, but now they were also linked to all the people closely related to the individual. So I’ve come to realize that because the events are referenced so much, it will be better to make Event a top level record.

To generalize, I’m thinking I can collapse most record types into a single table because they have so much in common. My idea may change, but what follows is what I’m going to try to implement.

The main table will likely be something like this:

  1. ID – sequential index number
  2. Type – type of record, from:  INDI, FAM, PLAC, EVEN [TYPE], NOTE, SOUR, SREC, OBJ, RELA (relationship), GRP, TODO, DNA, …
  3. Name (of person, place, source, etc.)
  4. Filenum (the data file it is from)
  5. FileID (the ID of this record in its original file)
  6. Date
  7. Place ID
  8. Detail (in GEDCOM format)
  9. User Reference Number (for future use)
  10. Private (for future use)
  11. Changed date

There will also need to be some sort of a Partner/Parent/Children table linking people with their partners and people with their children and the partner start relationship event and the child birth/adoption/fostering events. I still have some thinking to do for this structure. But it will be based on the internal structure now in Behold and may look something like this:

  1. LinkType – type of link (to partner and/or family, to child and/or parents)
  2. Person ID
  3. Family ID
  4. Person Link Detail (in GEDCOM format)
  5. Family Link Detail (in GEDCOM format)
  6. Start Event ID
  7. Next Person ID (next partner or next child)
  8. Next Family ID (next family or next parents)

The rest of the tables will simply be combination of indexes to make everything work fast together, e.g.

  • Place index is:  ID (place), ID (non-place)
  • Source index is:  ID (source/source record), ID (anything) 
  • and a bunch of others

When I implement To-Do lists, I can connect it like I plan to the repository:

  • To-Do index is: ID (repository), ID (to-do)

Life events should be much easier to compute. I’ll have a table of relationships:

  • Person ID, Related Person ID, Relationship

and then the index of life events:

  • Person ID, Related Person ID, Event ID

You know. This could actually work out quite nicely. But we’ll see what happens. It is a process that once underway, evolves into what’s necessary.