Login to participate
Register   Lost ID/password?
Louis Kessler's Behold Blog » Blog Entry           prev Prev   Next next

A Plethora of Extra GEDCOM Tags - Mon, 21 Nov 2011

As it turns out, I was able to handle the RootsMagic 4.0 illegal GEDCOM tags fairly easily.

There were only FIELD, TID and VALUE. I was able to simply add them to Behold’s internal default tag file. The improperly used NAME field would have been a problem, but Behold displays it correctly and doesn’t try to generate a person’s name from it because it is not a level 1 tag. To be smart, I should custom handle it, and then it can be controlled separately (i.e. to display/hide it, or to change the text it will display), but I’m not going to bother unless it becomes an issue.

Behold displays a message for these tags. They are illegal and make the GEDCOM technically invalid. Behold can read them, but other programs may not.

Those are not the only illegal tags I’ve had to deal with. There’s also: AKA, BIRN, CENN, CIRC, CURN and OTHN that I’ve found in GEDCOMs generated by Family Origins, FTM, Legacy, PAF, Brother’s Keeper and Generations.

What should have been done, if the program authors wanted to include these extra tags, would have been to make them custom tags. Custom tags are allowed in the GEDCOM standard and is the way GEDCOM allows programs to add extra tags. All that would have been necessary would have been to add an underscore before the tag. e.g. simply make FIELD into _FIELD and AKA into _AKA.

Now this seems like a trivial change, and it is. But the former is illegal, and the latter is not. However even the latter, the custom tags, should still be discouraged. The reason is that a different program reading this GEDCOM file will not be able to understand the meaning of either illegal tags or custom tags unless the programmer has gone through the work to figure out what the originating program has used the tag for. This is not an easy task or a fun task. And this practise in essence is one of the three main reasons why GEDCOM doesn’t seem able to transfer all your data properly from one program to another. The other two reasons are because (1) the first program’s GEDCOM export may be improperly implemented, and (2) the second program’s GEDCOM import may be improperly implemented. Some people think that GEDCOM is at fault and cannot handle data transfer. In my opinion, that’s not the case. See my past blog article: Build a BetterGEDCOM or learn GEDCOMBetter?


I have figured out what all of them mean and have given them a reasonable default tag text value so they’ll display well in Behold. When Behold encounters a custom tag not in the list, it will add it, and by default display it using its own tag name as the display text. You can then customize it to say what you want. This way, new tags will always work. I’ll add them to the internal default list when I find out about them, but it’s not critical that I do.

In addition, there are some entire new custom records added. I believe these are technically illegal even with a custom tag. Legacy adds its own Place Definitions (_PLAC_DEFN), and Event Definitions are added by Legacy (_EVENT_DEFN) and RootsMagic (_EVDEF). They were a pain to implement - believe me!

GEDCOM’s old SCHEMA method to add definitions of new tags was no fun either, but that was removed prior to the most recent GEDCOM releases (thank goodness!). You can find the SCHEMA still in some older GEDCOMs, especially in FTM GEDCOMs, but they can be safely ignored.

I won’t again go into the mess about FTW TEXT that Family Tree Maker imposed on the world. Just try reading one of those files without Behold.

All these problems pale in comparison to the incorrect programming of the concatenate tag (CONC) tag by many programs. GEDCOM says you always must end the line in the middle of a word, with the rest of the word beginning on the next line. This was defined this way so that the two lines could be plastered together with no spaces between them, preventing any mistake of possibly concatenating extra white space at the end of the first line. But too many programs split the line at the end of a word.

This can make a programmer tear our hair out. There is no way to fix this. If we assume they do it correctly and they don’t, we lose spaces between words. If we assume they don’t do it correctly but they do, then we add spaces in the middle of words. Nothing in the GEDCOM tells you which way it is. You can try to use artificial intelligence and guess, but there’s no guarantee you’ll guess correctly.

In Behold, I keep a table of programs and versions of programs that do not do it correctly. It includes programs who identify themselves in the GEDCOM file as: AncestQuest CFTree FamilyOrigins FamTiesDlx FamTreesQE FTM and FTW with the version name starting with: “Family Tree Maker”.

Even so, this list is likely not complete and may be incorrect for certain versions. So I went through the trouble of adding the ability on Behold’s Organize GEDCOMs page to allow you to change the CONC usage of each of the GEDCOMs input. This was yet another pain - only necessary because other programs do not follow the rules.

None-the-less, this all works very smoothly in Behold. Behold is probably one of the most generous and flexible GEDCOM readers there is. I sort of refer to this handling of GEDCOM as “Extended GEDCOM”. If Behold can’t read your GEDCOM, it’s likely no program can.

*** Two days to Version 1.0 and counting *** :-)

3 Comments           comments Leave a Comment

1. genej (genej)
United States flag
Joined: Wed, 5 Jan 2011
13 blog comments, 0 forum posts
Posted: Tue, 22 Nov 2011  Permalink

“If Behold can’t read your GEDCOM, it’s likely no program can.” –

That is so great, Louis.–GJ

2. mtibesar (mtibesar)
United States flag
Joined: Fri, 25 Nov 2011
5 blog comments, 0 forum posts
Posted: Fri, 25 Nov 2011  Permalink

I would be interested to see your “Top Ten List of best GedCom Programs” that rate genealogy programs from best to worst on how they follow the standard. Also, I will be very interested in your future program that makes “invalid” gedcoms valid. I don’t trust any of the genealogy programs nowdays since they all have proprietary formats and file extensions. The one thing I want to depend on is exporting and saving my database into a gedcom. I have copies of my gedcoms 20 years back. Thank you for sharing your views on the Gedcom standard - it IS important and this standard is often slighted by the independent software vendors.

3. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
231 blog comments, 226 forum posts
Posted: Fri, 25 Nov 2011  Permalink

mtibesar: That’s an interesting question, and I don’t have a good answer for you … because I tend to only notice the bad ones during my testing, and not the good ones. I do know I have notes and comments in my program source for some problems that I’ve noted, and they include: Ancestral Quest, Brother’s Keeper, Family Origins, FTM, Generations, Legacy and PAF. Then there’s the mistake programs make with the CONC tag which is horrid! See “CONC Me On The Head”: http://www.beholdgenealogy.com/blog/?p=739

I do plan to in my Version 1.1 to store my database and export to (and ONLY export to) valid GEDCOM 5.5.1 which is the de facto standard. So Behold should be able to be the tool to bring all your GEDCOMs up to date in a manner that you can be confident can be read by any program that can properly read GEDCOM 5.5.1. But don’t ask me how many of them do that properly, because I don’t know that.

Leave a Comment

You must login to comment.

Login to participate
Register   Lost ID/password?