Login to participate
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Newly Rediscovered: GEDCOM 4.0 (and a bonus!) - Sun, 16 Nov 2014

Trying to write a flexible GEDCOM reader to read in every flavour of GEDCOM back to the early days is rather difficult when the standards used prior to version 5.3 from 1993 just aren’t available anywhere.

I have scoured the Internet and the Internet archives for that matter, to try to dig up some of the early documents. In particular, I couldn’t find any of these versions: 1.0 (1984), 2.0, 2.3, 2.4, 3.0, 4.0, 4.1, 4.2, 5.0, 5.1 or 5.2 (1992), some of which were said to be “Drafts” and some of which were said to be “Standards”.

A few months ago, I started emailing some of the developers I know who have been using GEDCOM since the early days. I even contacted Bill Harten who led the team developing GEDCOM. He and some of the other developers told me they may still have an old printed copy of it somewhere, and if they weren’t thrown out or eaten by rats, they might possibly be in some old box in their garage or attic that would take months to sort through and find. <Sigh>. It seems that GEDCOM 5.3 was the first version that made it to electronic form. All the other versions were hardcopy … until now.

Through correspondence with Diedrich Hesmer, the developer of Our Family Book and GEDCOM Service Programs, Diedrich contacted the members of his GEDCOM-L list where 24 German speaking genealogy software programmers communicate with each other. It turned out that one of his colleagues, Gisbert Berwe, the author of the program Gen_Plus, found he had a printed copy of GEDCOM 4.0. He scanned it and has now posted it on his website.

You can find Gisbert’s PDF of GEDCOM 4.0 here:

The first 12 pages of this document are not actually the GEDCOM standard, but are the Data Structure Description of the Personal Ancestral File program version 2.1, dated 23 June 1988. I think it is likely that these pages were at the beginning of the document, since PAF and GEDCOM were being developed at the same time by the Family History Department of the LDS. The 12th page appears to be a page Gisbert may have included by mistake, being a German family relationship chart, instead of page 12 out of 12 of the PAF structure guide. But that doesn’t matter because the GEDCOM specs follow.

The GEDCOM Standard 4.0

The GEDCOM 4.0 specifications follow in the next 96 pages. The sections include:

Chapter 1: Specification for GEDCOM Level Numbers
Chapter 2: Specification for GEDCOM Tags
Chapter 3: Specification for GEDCOM Transmission Headers and Trailers
Chapter 4: Specification for GEDCOM Cross-Reference Identifiers
Chapter 5: Specification for GEDCOM Values
Chapter 6: Specification for GEDCOM Character Sets
Chapter 7: Specification for GEDCOM Transmission Media
Appendix: GEDCOM Tags

The Introduction is 2 pages. The first page of the introduction is missing from the scan and the second page follows page 1-1. I’ll have to see if Gisbert could scan page In1.  The seven chapters are 35 pages, and the Appendix which contains an alphabetical list of the Tags and their definitions is 47 pages. Gisbert accidentally repeated the cover page and included it where page A-42 should have been.

In my initial scanning of the document, I had to admire the GEDCOM team’s admission that the standard wasn’t perfect. On page In2, they state:

Future Editions of This Document

GEDCOM is still new, and has not yet been exposed to demanding applications over an extended period. Changes will be made as necessary. Chapter five, “Specification for GEDCOM Values,” will be updated to include format definitions for digitized photo, audio, and video information when the need arises and the required specifications have been completed. 


Bonus Document! PAF GEDCOM Specifications

I didn’t expect what followed in the next 34 pages.

PAF GEDCOM Specifications 1990

This document is not GEDCOM, but it details the implementation of GEDCOM in PAF Releases 2.0 through 2.2.

It states in its Introduction, Page 3 of 34:

“This document is necessary because many essential details about data structure and the use of tags in GEDCOM are specific to the implementation. In addition, the PAF 2.1 and 2.2 implementations differ from the PAF 2.0 implementation. PAF 2.0 was developed while the GEDCOM standard was still being refined. PAF Release 2.1 and 2.2 conform to the GEDCOM standards formally approved by the Family History Department in October 1987 (GEDCOM Release 3.0) and August 1989 (GEDCOM Release 4.0).”

In other words,
the PAF 2.2 implementation tells us more about GEDCOM 4.0,
the PAF 2.1 implementation tells us about GEDCOM 3.0, and
the PAF 2.0 implementation tells us about GEDCOM 2.0.

Wow! What a find!

Just like all archaeological digs, it will take time to study and analyze the details of these GEDCOM 4.0 and PAF GEDCOM Specifications before the lives of the early GEDCOMonians can be fully understood.

Source-Based Thinking - Tue, 11 Nov 2014

It’s time genealogists stopped their conclusion-based thinking and started going source-based.

Source-Based Document Organization

Source-Based Data Entry

Standardizing Sources

and now

Repository-Based To Do Lists

(Do you think there’s a not-so-subtle theme here?)

I advocate that everything is better if you start with the source. 

Genealogists have for too long been recording their name/event/fact/relationship conclusions first manually onto family group sheets, and now into the similarly organized data entry forms of their genealogy software. If they think of it, and if their software makes it convenient enough, they then just might decide to add a source to it … if they feel like it.

I find it unbelievable to think that people do all this work, and the result is they have no idea what they’ve entered and what they haven’t. Their source materials are a shambles. They can’t find their originals since their filing system for their physical documents are an unorganized mess, as are their computer files. Or even worse – organized by family.

By organizing your documents, computer files, data entry method, to do lists, and everything else by source, suddenly the world opens up. You know where you are and where you are going.

One-name and one-place studiers have known this for years. They thoroughly analyze all the information they can from every source. They know what they’ve extracted and what they haven’t. Every item of information they have comes first from the source and is entered by source and every item is documented with its source. Their to do lists are the sources they are going to look at. The only thing they lack is good source-based software, because almost all genealogy software is conclusion-based which provides minimal help for them.How Do You Find Your Needles in Haystacks?

Are you looking through a hundred haystacks for one needle? Then are you looking through the hundred haystacks again for the next needle? That’s a lot of work for every needle. For every little fact you need.

Wouldn’t it be better to look through one haystack at a time and find all the needles you can in that haystack? Once that’s done, go to the next haystack. Get everything you can out of every document while you’re accessing that document.

Source-based genealogy. Source-based thinking. It will change everything.

Behold’s getting there.

What Do You Want _TODO? - Sun, 9 Nov 2014

Do this, do that, do this, do thatWith the talk of a new GEDCOM standard, and my talk about the old GEDCOM standard, one item not yet considered has been To Do lists.

Many genealogists seems to want some sort of method of tracking their goals and the information they plan to find. They feel that keeping track of what they want to accomplish will help them do their research. They want their genealogy software to record this for them. They expect that their program will provide the means they need to keep them focused and on track.

Well, that’s what’s supposed to happen in theory, but it doesn’t seem to always work that way in practise. I’ve researched methods of planning for project management and managerial duties and I’ve looked a various ways to simply keep myself or any person organized and not forget what’s needed to be done. Ideally there should be one scheme that should work right from the big projects down to the simple tasks. Take a look at my past post about Getting Things Done and my next post about Fixing Getting Things Done. I concluded by saying that I thought I had the model to implement a simple but useful To Do list into genealogy software.

That was over 5 years ago. Since then I learned a few things. I found that for me, nothing more complicated than keeping a simple list of things to do worked. Every method other than that worked for a week or a month and then got abandoned for the plain old reliable list. All I really needed to do was make accessing and updating that list simple, and my smart phone turned out to just be perfect for that.

That’s all you need if you only have a few things to do, a simple list. But once your list starts to grow and you have more than 10 or 20 items, it starts to become unwieldy. So you have to divide and conquer and place your items into categories.

What should the categories be? That’s actually obvious when you think about it. You need to subdivide by WHERE you will be doing the task. That way, when you are somewhere, you will have the list of what needs to be done there. People normally segregate this way, and place all the items they’ll buy at their supermarket on a grocery list. All the material needed to build the shed in a corner of the backyard. All the clothes to be washed in a closet. All the items to do on your computer in a huge pile on your desk. And all the items you want to research about your family at a particular website together in one list waiting for when you have some research time for that website.

You do do that last item, don’t you? I’m amazed at how many people don’t. Instead, they organize by person.

Let’s say you want to find your g-g-grandfather’s birth certificate, and his brother’s wife’s name, and their son’s wedding information, and your great-uncle’s immigration record. Let’s write down every other thing we want to find about every person.

Well that’s almost seems ridiculous to me. What you’re creating is a big list of all the things that you don’t know and need evidence for. It doesn’t matter how long you’ve been doing genealogy, you’re are always going to have people you need information about. Even if you know 10 generations back, then you don’t know the 11th. And in-between there is always a lot of information you are unsure of.

So let’s just list every unknown fact for 4,000 people in your tree and put those 20,000 items in a To Do list. That should really help, shouldn’t it?

Bleachh!! This is what most genealogy software that allows adding To Do notes by person is telling us to do. It’s totally the wrong way to do this.

What must be done is to organize your To Dos by where you want to do it. You may know of some records might be at your local archive. There may be some vital statistics you have to write away for at the state office. You may have some information you need to get from Aunt Helen. There may be some online searches you want to do at your computer at a particular website when you have time to do it thoroughly. Or you may want to make a trip to your grandparent’s village and take in everything possible.

It is now clear. Divide and conquer by where the task needs to be done. Attach your To Dos to the WHERE.

In GEDCOM terms, the “where” are the the locations that hold the information sources you are looking for. These are known as the repositories, that are recorded as the REPO record in GEDCOM. Every source has a REPO that you got it from. Every REPO is linked to by all the sources you got from it.

I say the proper means is to attach your To Dos to the repository you will do it at. When you go to that repository, be it the local archive, Aunt May, or the online website, you’ll have the list in front of you what you want to do there. This, in a nutshell, what you need to do to be organized and efficient.

Okay, now I’ll get off my soapbox and just look at what some programs do. I’m not actually going to look at the programs themselves, but I’m going to look at how they export their To Do data into GEDCOM. This gives a good idea of the thinking behind these programs.

First of all, GEDCOM never included any capability to store your objectives, goals, tasks or To Dos. Many developers added this capability to their genealogy software, and then found no official way to export it into GEDCOM.

Some of them created an _TODO tag. This is a user-defined in GEDCOM, with a leading underscore on the tag name. The term “user-defined” is a bit of an oxymoron, because it is the developer, not the user, who is defining this tag. None-the-less, let’s take a look at what they’ve done.

Among my 650 files, I have only 13 with _TODO tags in them. Most of them are from Legacy versions 5.0 and up.

Legacy typically exports its To Dos to GEDCOM that, in their most complex form, look like the following. The TODO tag is subordinate to a INDI record, so it is associated with a To Do for a specific person. Not all the level 2 tags shown below are included in all cases, so it looks like only fields the user fills in are exported:

0 @I1@ INDI

2 DESC Request search for obituary in Paris newspaper
2 _CAT Obituary
2 _LOCL Paris, France
2 DATE Jul 1998
2 _CDATE 6 Aug 1998
2 STAT Completed
2 TYPE 1
2 PRTY 8
2 REPO @R12@
2 NOTE Request search for John’s obituary
2 DATA 6 Aug 1998 - received letter stating that the n
3 CONC ewspapers began publishing after John’s dea
3 CONC th so they were unable to search for an obituary

It’s pretty easy to figure out what each field means here. This really is not that bad an implementation. There’s a date, a completion date, a type that means something program specific, notes, and what I think is most important, a link to the Repository!

What is really sad is that the 13 GEDCOM files I have that use the _TODO tag use it only two or three times in the entire file. If this were something that really was set up in a useful way, you’d think people would use the feature much more. But they don’t seem to.

Displaying this information with the Repository would be useful. But Legacy likely only displays it by person, which IMO it’s pretty useless. This might explain why so few people seem to use the feature.

I have also seen similarly non-trivial implementations of the _TODO tag included in GEDCOMs produced by Ancestral Quest and RootsMagic and Family Origins (the predecessor to RootsMagic). It’s amazing how similar the four implementations look – and that’s not a bad thing, because it potentially allows the possibility to correctly transfer the To Do data between these programs.

I think it is useful for a future GEDCOM replacement to include this sort of information for the researcher. But one change must be made. The ToDo tag should be attached to the repository record and within it link to the person or people (if any) the To Do may be about. It should be of the form:

0 @R12@ REPO

2 DESC Request search for obituary in Paris newspaper
2 _CAT Obituary
2 _LOCL Paris, France
2 DATE Jul 1998
2 _CDATE 6 Aug 1998
2 STAT Completed
2 TYPE 1
2 PRTY 8
2 INDI @I1@
2 NOTE Request search for John’s obituary
2 DATA 6 Aug 1998 - received letter stating that the n
3 CONC ewspapers began publishing after John’s dea
3 CONC th so they were unable to search for an obituary

Doing so would encourage developers to attach the To Do items to the Repository and display them with the Repository information. Then the information would be useful. If this information becomes useful, then people might actually start using it.