Login to participate
  
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Build a BetterGEDCOM or learn GEDCOMBetter? - Wed, 5 Jan 2011

A month ago, I blogged about the BetterGEDCOM endeavor. In the ensuing month, I’ve gotten involved and added my two cents worth. I find I’m mostly a lone wolf in the woods.

Up until a few days ago, I didn’t know why my ideas were so different from everyone else. Almost all of them are of the opinion that GEDCOM standard has major shortcomings and needs to be replaced. I differed in my view, and as I said a month ago, think it needs to be brought up-to-date (via XML and Unicode), and maybe could use a few improvements. I’m talking about tweaking - not a major overhaul, An evolution and not a rewrite.

But I think I found out what’s going on and why the opinion is that GEDCOM is bad. Over at the BetterGEDCOM blog, a blog site run by some of the people who initiated the BetterGEDCOM initiative, they’ve been running GEDCOM tests. They would take one program, say Family Tree, enter some data into it, export it to a GEDCOM file, import that GEDCOM into a 2nd program, say RootsMagic, and then see that the 2nd program doesn’t show it in the same way. They’ve done many tests, for example their latest test was done with address information. Their conclusion was: “Looking at the GEDCOM 5.5 it appears that the format is correct. However, how and where this entry should be, may be the problem in the GEDCOM Standard.”

Now I give them a lot of credit for doing these tests to see how programs differ, but by highlighting how poorly the programs have implemented GEDCOM, they are spreading the illusion that GEDCOM is to blame.

My experience with GEDCOM is very different. In the development of Behold, I’ve tried to make Behold as flexible a reader as possible. I want Behold to display the data from any GEDCOM that was created from any version of any genealogy program, as accurately as possible. Doing so, I had to study various parts of GEDCOM in detail, and try to interpret how each program was exporting their data in terms of the GEDCOM standard.

Almost all programs do something different somewhere. They may add extra tags in a standard way, or in their own non-standard way. They may use some GEDCOM constructs a bit incorrectly, or they may abuse them totally. Generally the programs come close to the GEDCOM standard, but it is rare to see a perfect GEDCOM meeting all rules of the standard.

One good example is the CONC (Concatenate tag). The GEDCOM standard says the word on the line preceeding the CONC tag must be split so that some letters of the word is on the preceeding line and some are at the start of the CONC line. Then the program reading the GEDCOM should paste the two lines together with no spaces. But some programs export this wrong and the line before their CONC has a complete word and the CONC tag line starts with the next word. What this means is that the program reading the GEDCOM needs to put a space between the two words when it puts the lines together. Now look what this does to me, the beleaguered programmer. First I have to program this for both cases, the correct and the incorrect program. If I assume the program is correct but it isn’t, words will be plastered together. If I assume the program is incorrect but it is, there will be spaces where there shouldn’t be. This poor programmer now has to maintain a table of incorrect programs (or put in some fancy algorithm to try to detect how it handles CONC - and even this is subject to error). The bottom line is that every one of these misinterpretations of GEDCOM is a lot of extra work to handle, and it needn’t be so if every program just followed the standard.

The implication is that because programs are not following the standard, then there must be something wrong with the standard that makes it difficult for the programs to follow them. Therefore this must be corrected. Well, to me, that’s hogwash. In this case there is nothing wrong with the standard. The problem is with the programs and the programmers who aren’t diligent enough to follow the correct standard in the first place, and if it is pointed out that they are doing it wrong, aren’t accountable enough to change their program so they do it right.

Okay. So that’s case in point #1. Maybe they need to learn GEDCOMBetter, rather than get a BetterGEDCOM. With their current work ethics, they’ll just implement that BetterGEDCOM incorrectly as well, and the result will be that BetterGEDCOM is no better than the GEDCOM it was intended to be better than. (Whew, that’s a mouthful!)

But the 2nd point is maybe more important than the first. That is that there is a misconception that GEDCOM has major shortcomings. Now I’m not saying it’s perfect, and as I keep saying, I’d like to see a few changes to it myself, but check what the BetterGEDCOM initiative has decided is wrong with GEDCOM.

They have a “GEDCOM Messes This Up” section and mention sources, citations and certainty assessment. In my observations, GEDCOM has a very advanced facility for sources and citations and a wonderfully simple and usable certainty assessment. GEDCOM doesn’t mess it up. The programs that implement it incorrectly mess it up.

Then there’s a “GEDCOM Won’t Transfer This” section. Sources are again mentioned, but Behold has no trouble reading and displaying them from a multitude of programs. They transfer fine for me. I’ve seen very exhaustive and detailed sourcing and ensured that Behold can display it all correctly. It’s not my fault or GEDCOM’s fault that some programs don’t export their sources. With regards to places and documents/images, yes GEDCOM could use some tweaks to support those better. But these are tweaks in a natural evolution of GEDCOM, and not something deserving a total rewrite.

“I Want My Genealogy Software And BetterGEDCOM To Do This: Handle evidence and not just conclusions. Do conclusion chaining. Round-trip the data, (and a number of other things).” These are all things up to the software programmer to implement. Current GEDCOM can handle it just fine. This doesn’t need a BetterGEDCOM for it to be done.

But what I’ve found is that GEDCOM itself is not understood well, and unfortunately not well by those attempting to change it. I’m not sure how you build a “better” house, if you don’t know how good the house was that you had before.

I have come to respect GEDCOM and the people at LDS who developed it. It was quite a major work effort spanning many years and many versions. They had to do a lot of deep thinking about what was wanted, and then implement it in some logical way. When you look at some of the details of GEDCOM, you see some very advanced capabilites, but many of these have been seldom or rarely used by genealogy programs. This is partly because GEDCOM evolved and matured faster than programmers could handle it - and then it stopped at version 5.5, the unofficial 5.5.1 and the Draft XML 6.0.

Was it a good standard? Absolutely. It allows data transfer between programs. And you know what happened in the ensuing years. Just about every single program adopted GEDCOM as their import and export mechanism. I’m trying to think of other standards that 99.9% of an industry have adopted. There aren’t very many. To me that makes it more than a good standard. That makes it a great standard. The measurable goal of what makes a standard great is how many people use it.

Some people even started using it as a data store. I remember decades ago when Cliff Manis used Tom Wetmore’s Lifelines program and created the GenServ System that still exists and collects names. Cliff, if you’re still out there, do you remember when I visited you at your home when you had first started it up in the early 90’s? That was me. You were a very gracious host. So here we had a system of data storage, built purely out of GEDCOM, that preceeded the Geni and MyHeritage and OneGreatFamily’s of the world. We had and have a GEDCOM standard that can be used for data transfer and for data archive and retrieval.

Again, is it perfect? No it’s not perfect. It has aged after 15 years and needs a few tweaks.

What it doesn’t need is a rewrite.

Maybe what’s really needed is an education program. So that developers will be able to study and learn what treasures are really hidden in the old GEDCOM standard. So that they’ll be able to learn how to implement the features correctly. And so they won’t go off trying to rebuild from scratch the rooms in the house that are perfectly fine.

12 Comments           comments Leave a Comment

1. genej (genej)
United States flag
Joined: Wed, 5 Jan 2011
13 blog comments, 0 forum posts
Posted: Wed, 5 Jan 2011  Permalink

Hi Louis:

Another great blog posting.
I think the part of the Build a BetterGEDCOM blog effort to which you are referring is aimed at providing real time, user-to-user examples of genealogical data being transferred between two programs using GEDCOM.

I won’t speak for Russ, but I know I sure haven’t set out to play GEDCOM expert on the blog.

Separately, it’s been years since I felt I could rely on a GEDCOM transfer. When it fails, I feel a little like a Microsoft Windows user in 1993 — it’s always the other guy’s fault. –GJ

2. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Wed, 5 Jan 2011  Permalink

Yes Gene, and even in your comment, you’re still blaming the delivery boy (”rely on GEDCOM”), when the real culprits are the exporting program and the importing program.

I don’t blame you for this. It is the natural thing to do.

What you and all users should have done for years is complain. Complain LOUDLY to the vendors of the software, and insist that they fix their programs.

3. rjseaver (rjseaver)
United States flag
Joined: Wed, 5 Jan 2011
5 blog comments, 0 forum posts
Posted: Wed, 5 Jan 2011  Permalink

Excellent thoughts…I’ve had some of the same but haven’t been able to elucidate on them as well as you have.

From what I’ve read, I think the three biggest problems with the current GEDCOM standard is:

1) The lack of a way to attach media
2) The nuclear family structure.
3) The ANSI vs Unicode problem with languages and alphabets

I’m not enough of an expert to figure all of that out (hmmm, I’m not any kind of expert, just a user…).

Perhaps if the group can get the software companies to buy-in to the standards set by GEDCOM then we could have an improved standard.

But it seems to me that the companies want to have their own version to be able to do some things differently from the other companies. What other reason would there be to have different GEDCOM tags or formats? Seems that Ancestry.com has an interest in keeping FTM different from the others. If the FamilySearch Family Tree becomes the de facto leader in free online family trees, then perhaps the software companies will agree to a standard, but FS don’t seem to care any more - they’re depending on the affiliates to do the hard work of adapting to the FSFT model and formats.

4. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Wed, 5 Jan 2011  Permalink

Thanks Randy.

I’ve now been doing some research into the early beginnings and thoughts and concepts that Bill Harten and his team had about GEDCOM. My respect for their work grows with every tidbit of history that I uncover.

Redoing GEDCOM is not a simple task that two dozen people can accomplish in their spare time on a Wiki and with weekly one-hour chats. It took a dedicated team of expert genealogists and programmers from the Family History Department of the LDS church years to work together to formulate a comprehensive standard, and then refine and expand it, and do so several times. The resulting product is not a put together mess. It is a cohesive meaningful and at the time comprehensive standard, of which every tiny part was debated and decided on with good reason.

If I had my druthers, I’d get that team, or a team of similar structure, together again with paid salaries and dedicated responsibility to the standard, who understand the concepts that have been included in it so far, to take over the evolution of the standard from here. Hello LDS? Can I pray for a miracle?

5. hrworth (hrworth)
United States flag
Joined: Wed, 5 Jan 2011
1 blog comment, 0 forum posts
Posted: Wed, 5 Jan 2011  Permalink

Louis,

From the testing that I have done, between two programs, I think the issue is really in the details that were provided in the GEDCOM 5.5 Documentation. Technology has changes, and the developers, over that time, had not gone back to look at that document and ask questions or tightening up the requirements.

Sure, there are many other features, like Randy mentions, that we need to get into the requirements of the next vehicle that is used to Share Information.

I have shown, a couple of non-technical details on the BetterGEDCOM Blog, where I tried to point out. I have taken a couple of specific examples and looked at the GEDCOM 5.5 document to see what the issue is. In this Non-Technical, End User’s point of view, the problem might be in the 5.5 requirements. Sure, there is more stuff to do, but at this point, this started because two end users couldn’t share data.

I am not trying to say or show which program is “wrong”, cause I don’t really know.

For example:

I used a Burial Fact / Event, trying to get it between two programs. For the example, I used the Location and the Cemetery Name.

The Sending Program, has a Place field and a Description field. The Location when into the Place field, the Cemetery name into the Description.

Sent that GEDCOM to a second program. Guess what, the Place name ended up where it belonged AND the Description (Cemetery Name) ended up in the Description field on the other program. However, the Other Program wanted the Cemetery Name in a Detail field, not Description field.

Looking at the GEDCOM file, this End User, couldn’t tell if the GEDCOM file had the information in the right order or not.

Just this afternoon, I took that same GEDCOM file, and opened it in another program. The Location came in OK, but the Cemetery Name is now where to be found. Now, I just started to use this program this afternoon, so I still will look around before I post anything about these results. Also, this program is due for an upgrade, shortly. So, I won’t do any more testing until I have the upgrade.

The next set of tasks, is with Sources and Citations.

Beyond this basic sharing of information issue, This project is trying to think ahead a little bit, to see what might need to be needed for the larger Genealogy community. Specifically, those leaders in this field who are struggling with how to do research or how to evaluate our research. Looking at some of the ‘new’ standards that are showing up.

The GPS by Mark Tucker is one example.

I heard Elizabeth Shown Mills talk about 2 years ago, and I think Randy was in that talk, where she pointed out that we need to Evaluate our Evidence, now that we have collected it. The BetterGEDCOM project doesn’t need to develop those tools, but have the ability to share what conclusions we reached in our research and how we got there. I think there is a GEDCOM Tag for sharing a 1 to 3 star rating on a Source, but what I am talking about it beyond that.

I have listened to End Users for about 10 years on this sharing topic, tried to help them know what they are doing or not doing, what can expect or not expect. So, I have heard most of the stories. Forgotten many though. But these tests that I am trying to post on the blog are examples of a common End User trying to share research using the vehicle we have today. Then, let the Technical folk create a useful solution that the Application Developers can ‘buy into’.

I think the Wiki is getting to some of the details, which is why we have the Wiki. I think that for the amount of time that this project has been ‘live’ on the Wiki, we have made lots of headway.

Thank you for your participation in the Conference Call earlier in the week and thank you for this blog.

Russ

6. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Wed, 5 Jan 2011  Permalink

Russ,

I think that the work you are doing is excellent. Without placing blame on which program might be wrong, the bottom line is that the two programs simply did not interpret and implement the standard the same way.

In my opinion, your tests are great because they are showing the areas where data transfer with GEDCOM is failing.

If the program authors really cared about their users, they’d realize that their users feel that transferring their data between programs and archiving it in a GEDCOM file that can be read again later is extremely important to them.How many people have lost their data when the vendor dissolved and they lost their copy of their program when their computer crashed? They only had that program’s proprietary database backed up, but nothing to read it. (Got any 8-tracks lying around?).

Those program authors would make sure their program at least will export to GEDCOM according to the standards. The users will feel assured that there should be some program out there that should then be able to read it in. But if the standards are not followed, then there is no guarantee.

On the Wiki, expect controversy. Especially from us programmers. We are a wildly eccentric bunch.

7. kulath (kulath)
United Kingdom flag
Joined: Fri, 17 Feb 2012
1 blog comment, 0 forum posts
Posted: Fri, 17 Feb 2012  Permalink

Don’t feel like a lone wolf! I have been looking for some time for someone else with this view, sorry it’s so long after your post!

I keep trying to find out “what is wrong with GEDCOM”, and (apart from some really minor tweaks) all I can find is example of where poor implementation of the standard means that different programs do different things.

For example, you mention the BetterGEDCOM test with address information (actually phone number). Well, although they say “Looking at the GEDCOM 5.5 it appears that the format is correct”, I don’t think it is. Phone number simply isn’t allowed where they found it, so their complaints have absolutely nothing to do with whether GEDCOM is any good.

I agree with you that there are very few standards that are as widely implemented as GEDCOM, and any new standard is just as likely to be abused.

8. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Fri, 17 Feb 2012  Permalink

Thanks, Kulath,

With the new BetterGEDCOM and now GEDCOM X work being done, I’m trying to stress that they attempt simplicity rather than complicated models attempting to include everything. They attempt to include everything because they think that this will allow the data to be transferred, but by doing so will doom a new standard to failure because they introduce more ways that the developers implementing the new standard can make mistakes. As simple as GEDCOM is, a new version somehow needs to be even simpler.

We’ll see what comes to be. I’m looking forward to whatever might come, and I’ll be ready to implement it … barring it being too complex.

9. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Tue, 21 Feb 2012  Permalink

Louis

I have recently sent the following to Tamura.

Do you have any thoughts.

****
Have we gone from ‘No one cares any more’ to ‘too many peas in the pod’?

Will there be acceptable discussion and compromise between parties?

Will it end up with 2+ new GEDCOM versions, which software companies either ignore or choose one over another?

Is this potentially another Beta vs VHS scenario?

All I want is to be able to share all my recorded data with others, including those who may not use the same storage program, and obtain same from others.
****

Brett

10. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Wed, 22 Feb 2012  Permalink

Brett,

I think who wins as the future GEDCOM is very much up in the air. There have been dozens of proposals to date that never caught on. And there’s several new initiatives now, and each may get to some point of usability, or may not.

People don’t realize how much time and work it takes to develop a new standard. I respect the work effort and time that it took the LDS to develop GEDCOM. They had a full time staff working years on it. We’re lucky we’ve at least got GEDCOM, or we’d have nothing.

I am encouraged by the GEDCOMX effort. It does appear that FamilySearch appear to be committed and are willing to devote staff and money and time to this effort. I feel there needs to be a leader and one with resources and clout.

Meanwhile, I’m sure Ancestry is working on their own model. It is a competition and a race. Other online database firms are producing APIs of their own which are akin to data models which map to standards. AncestrySync has their own paradigm which they’d better finish first to usurp the others. Maybe the next best thing has not been invented yet.

Who ever would have expected that in the race between Beta and VHS, that DVD’s would have won … or maybe Blueray?

Louis

11. meliasz (meliasz)
United States flag
Joined: Fri, 24 Feb 2012
3 blog comments, 0 forum posts
Posted: Fri, 24 Feb 2012  Permalink

Thanks for leaving your comments on my blog post (http://wp.me/q0aQ)

. I now follow you (and I also follow Tamura Jones — I love his blog).

I will follow-up with your cogent comment and try to passionately persuade you on my points! The answer is there between us.

May I add that your view and mine coincide in that I want GEDCOM to continue and evolve (evolution, not revolution). It is only a defacto Standard now (after a decade and a half of inattentiveness) and of course it lacks openness as a true standard. May I offer SQL as a standard that is comparable — i.e. a Standard and yet vendors provide their own enhancements and everyone is happy and it keeps improving.

–Stanczyk (Polish-American genealogist)

12. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Fri, 24 Feb 2012  Permalink

Stanczyk:

There are as many “answers” as there are people. We each have our opnions. That’s why a standard is so tough - because everyone has to compromise.

The difference with SQL is that enhancements can often be added to specific implementations without worry. Usually the SQL code does not get transported from one site to another. When it is known that SQL transfer is required, basic SQL is used.

But the purpose of GEDCOM is data transfer. Every enhancement one vendor adds means that those specific features must be handled by the other programs, or the data transfer is lost. That is why it is so much more important that genealogy programs adhere to the GEDCOM standard as much as possible.

And personally, I am amazed that 99% of genealogy programs today (hundreds of them) at least make an attempt to read and write GEDCOM … even now, 15 years later. It will be very hard to get the community to change en-mass to a new standard. The future will be interesting.

Louis

 

The Following 3 Sites Have Linked Here

  1. Twitter Trackbacks for Louis Kessler’s Behold Blog [beholdgenealogy.com] on Topsy.com : Thu, 6 Jan 2011
    [...] Louis Kessler’s Behold Blog beholdgenealogy.com/blog/?p=803 – view page – cached A month ago, I blogged about the BetterGEDCOM endeavor. In the ensuing month, I’ve gotten involved and added my two cents worth. I find I’m mostly a lone wolf in the woods. [...]

  2. Responses – Exploring Gedcom — #Technology, #Genealogy | Stanczyk – Internet Muse : Sun, 26 Feb 2012
    [...] Jones (Modern Software Experience), Louis Kessler (BeholdGenealogy.com),  and  Stan Mitchell (GenApps.net / ezGED [...]

  3. Commited to seamless genealogy file sharing - Dear Myrtle : Tue, 21 Jan 2014
    for further reading ... Louis Kessler's Build a BetterGEDCOM or learn GEDCOMBetter? and his Behold

Leave a Comment

You must login to comment.

Login to participate
  
Register   Lost ID/password?