Login to participate
  
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Whither GEDCOM X? - Thu, 7 Jun 2012

I’ve tweeted over the last few days of GEDCOM X’s announcements on its blog posts. I’ve followed its progress since Tamura Jones broke the news about it last December. I met Ryan Heaton, the employee of FamilySearch doing the most work on it, at RootsTech in February, and listened to his talks on the project. I’ve participated on the GEDCOM X community issue tracker, putting my two cents worth in when the need was there. I provided feedback to them in a March blog post.

They’ve now made their data format public. I’ve really only had a cursory look at it. I don’t have time currently to go into detail into exactly what their structures are. My main concern is that GEDCOM X will represent some of its data structures in very different ways than most genealogy programs, e.g. no family record and just relationships. Almost every program uses the GEDCOM lineage-linked structure with families and individuals and FAMC, FAMS, HUSB, WIFE and CHIL connectors. It would seem logical that programmers should change their internal structures to something else if that something else became the new standard. But that’s asking a lot.We’re talking about ripping out the guts of a program. If the structures are easily mappable, then they could be handled. But if they’re easily mappable, then why is a new standard needed at all?

I did try their GEDCOM 5.5 to GEDCOM X translator on one file, and then unzipped the file to see that it made a thousand JSON XML files. There is no inherent difference between XML and the GEDCOM grammer. They are probably mappable back and forth.

There are quite a number of JSON XML components for Delphi to choose from, e.g: https://www.google.com/search?q=delphi+xml but I personally don’t look forward to having to implement such a thing. With GEDCOM, I’m in control of the way Behold parses and I can optimize it appropriately.With 3rd party packages, it’s someone else’s code.

Converting the input between GEDCOM, XML, JSON or whatever is the easy (almost trivial and mechanical) part. The hard part will be incorporating the concepts that GEDCOM X has that GEDCOM and Behold don’t have into Behold. That is the part that scares me. If the differences are considerable, then it will severely hamper GEDCOM X’s full adoption into existing programs. That will give the horrible result of partial incorporation, meaning that GEDCOM X will not transfer all data between all programs, and thus be no better than GEDCOM is today.

For example, you can’t expect programs that don’t have an extensive citation capability to add one just because they find it in GEDCOM X. Or what if they had one, but it was structured very differently. And if GEDCOM X decides to throw in every possibility under the sun, or complicate things with complex multi-level multi-linked structures, you’ll be sure some programs just won’t get them right, or will refuse to do them to the level specified. The more complex the spec, the harder it will be to interpret correctly, and the more resistant the developer will be to implementing it.

GEDCOM X will have to be comprehensive, yet be as simple as possible. Those are two goals that work against each other, so it will be difficult to get the balance just right. It will have to use a 99% rule, and not try to include the other .999% which will increase the complexity level 1000-fold.

GEDCOM X will need allies. They’ll need to work with the FHISO folk. They’ll have to be wary of Ancestry.com and MyHeritage who have their own data transfer ideas. They’d likely also want to get AncestorSync involved early on.

I’m hoping for it. I really am. But they’ll have a hard sell.

 

—–

June 9: Ryan Heaton corrected me. GEDCOM X uses XML, not JSON. I’ve updated my post to reflect this.

8 Comments           comments Leave a Comment

1. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Thu, 7 Jun 2012  Permalink

All this is somewhat too technical for a mere genealogist as myself.

I appreciate the advantage of a standard ‘protocol’ for transferring information between programs. But I am at a loss as to why GEDCOM X may end up being the only format in the running for this standard.

Originally, I thought GEDCOM X was an internal requirement for Family Search and thus thought a standard was still required for transfer of data. From this, I envisaged Family Search would convert data from the standard (from other programs) to their format, to enable use on their site. I also thought Family Search would eventually offer a conversion option from their data to the standard.

What I wish to do, when it is all said and done, is to be able transfer my information between programs for my use (without loss or corruption) and also to send/give to other researchers/family members, to allow them to use the program / site of their choice.

2. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
143 blog comments, 200 forum posts
Posted: Fri, 8 Jun 2012  Permalink

Sorry for going too far on the technical side.

Yes, the goal of everyone is to transfer their data seamlessly between programs which - new standard or not - is never going to happen. It’s impossible to get a standard agreed on that is complete enough to include everything needed in an unambiguous way that every programmer will include and implement perfectly. I wouldn’t be surprised if a new standard never reaches the levels of acceptance and amount of correct data transferance that GEDCOM has.

GEDCOM X is leading the game, because they’re now the first of the new breed of replace-GEDCOM groups that has published their specs. I’m am saying there will be alternatives coming, but for there to be a standard, there’s got to be just one that wins. For it to best help the genealogy community, the various factions ultimately must work together and compromise on a new all-encompassing standard.

Ergo GEDCOM will be around for a while.

3. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Fri, 8 Jun 2012  Permalink

It’s impossible to get a standard agreed on that is complete enough to include everything needed in an unambiguous way that every programmer will include and implement perfectly.

While I reluctantly agree to the above, I would be hoping that any standard developed and accepted by a reasonable number of developers/web sites would be supported sufficiently that any ‘extras’ could be accomodated through updates.

4. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
143 blog comments, 200 forum posts
Posted: Fri, 8 Jun 2012  Permalink

Brett said: “… ‘extras’ could be accomodated through updates.”

Absolutely! GEDCOM was excellent at the time. If they had not stopped updating it, we’d have a really fine standard today. I’m sure it would have everything everyone is looking for and data would be transferring almost flawlessly between programs (of developers willing enough to properly implement it).

Instead, the adverse reaction is that “GEDCOM’s terrible” and “it has to be rewritten from scratch”. That’s the part I disagree with. In my opinion, GEDCOM is very good and has proved its mettle over time.

I think GEDCOM needs updates, 20 years of updates for that matter. But the basic concepts and overall structure is well thought out. To rewrite everything from scratch and recreate every idea is a gigantic effort that will never get agreement by everyone.

The sad part of it is that, in my opinion, all the things people think are wrong or missing from GEDCOM today could probably be fixed with one or two significant but not earth-shattering updates. That could be done and agreed to in a year or so, and it would be accepted quite readily by genealogy software developers who want a standard way to share their citations, place records, multimedia and other needed entities.

For those purists who want XML or JSON syntax, they are mappable 1-1 from the GEDCOM syntax and translators would be quick and easy. That’s trivial.

I have not been convinced by anyone yet of any good reason to start over, which is what everyone’s trying to do.

Louis

5. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Fri, 8 Jun 2012  Permalink

Brett said: “GEDCOM X may end up being the only format in the running for this standard”.
Louis said: “GEDCOM … could probably be fixed with one or two significant but not earth-shattering updates”.

So is/should an updated GEDCOM be a viable consideration for the standard? Does anyone else support your thoughts on this?

6. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Fri, 8 Jun 2012  Permalink

I note the concerns Tamura has said re size of GEDCOM X files. Do you see a GEDCOM update as being significantly different in file size to that of 5 or 5.5?

7. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
143 blog comments, 200 forum posts
Posted: Fri, 8 Jun 2012  Permalink

Everyone makes lists of what’s wrong with GEDCOM. There’s never anything in those lists that couldn’t be handled relatively easily with an updated GEDCOM. Ryan Heaton claims that GEDCOM X has resolved a lot of the deficiencies.

But have you seen any lists of what’s right with GEDCOM? I’ve delved deeply into GEDCOM as I’ve programmed the guts out of it with Behold. I could make a list of rights that is much more extensive than the list of wrongs. How many of the “rights” has GEDCOM X forgotten about and not implemented? There’s decades of these rights embedded in the old standard, including fixes and tweaks that happened as updates to the standard. That’s something not easily duplicated in a new standard developed from scratch.

An updated GEDCOM to me is not just viable, but the best solution to a new standard. I am probably the most vocal supporter of this. I don’t know offhand of anyone else who supports this idea. I have expressed these thoughts many times on BetterGEDCOM, but never got any encouragement to carry the thought forward.

Part of the reason, I believe, is that no one has taken the time to really study GEDCOM and the thought that went into it. Personally, I really appreciate what they’ve done. GEDCOM’s success has been phenomenal, as 99% of genealogy software have provided some measure of support of the GEDCOM standard. New programs today know they need to incorporate this 20 year old standard if they want genealogists to consider them. No other field that I know of has a standard that’s been around as long and become this entrenched.

8. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
143 blog comments, 200 forum posts
Posted: Fri, 8 Jun 2012  Permalink

Tamura is right. Bloatedness adds overhead in execution time and transfer time. If a standard is used for everything and that everything becomes 4 times bigger than it need be, then it’s a waste. When online databases of millions of names are created, it will slow things down from 4 to 16 times. That will slow down millions of people doing the same thing over and over, time after time. It’s better to not add bloatedness to a new standard if it can be at all helped.

As far as data formats go, GEDCOM is not too wasteful on space. It is better than XML, but not as good as a database. It’s the addition of all the “standards” stuff into the XML and the repetition of declarations for each entity that causes the real bloat.

Most of these updates to GEDCOM would not add significant size to the file. Some, such as a Place record, might add a bit of size, but nothing close to the 2 or 3 or 4 or 40 times the file size that the standardizations add.

 

The Following 3 Sites Have Linked Here

  1. Catching Up -- GEDCOMX, Better GEDCOM, and FHISO - Stardust 'n' Roots : Fri, 8 Jun 2012
    ... Louis Kessler, developer of Behold, has given a cursory review of the GEDCOM X publicly released data format. His primary concern is ...

  2. Whither FHISO and GEDCOM X? Observations and Commentary : Wed, 18 Jul 2012
    ... Since RootsTech 2012, there were some announcements and blog posts about GEDCOM X and FHISO ...

  3. Tamura Jones - Google+ - FHISO and GEDCOM X : Thu, 19 Jul 2012
    ... Louis Kessler did a blog post that garnered some comments: ...

Leave a Comment

You must login to comment.

Login to participate
  
Register   Lost ID/password?