Login to participate
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Is GEDCOM Good For Sources? - Thu, 7 May 2015

My interest was tweaked earlier today by a discussion between Tony Proctor and Nick Hall on the mailing list for the FHISO Sources-Citations Exploratory Group. In particular, Nick made the following statement:

The problem with GEDCOM is that it heavily restricts the types of source that can be easily cited.  Citations in Gramps are based on GEDCOM, and this is an area that needs improvement.  I suppose one good thing about GEDCOM is that it doesn’t specify formatting - it just allows the transfer of elements such as TITL, AUTH, PUBL and PAGE.

I had to respond, and I think I’d like to document my response and my opinion about GEDCOM and sourcing here on my blog. This is what I said:

I’d like to defend GEDCOM for a moment. Its source structure is much more flexible than you state.

In the SOUR record, it provides TITL, AUTH, PUBL, and also DATA (with an AGNC - responsible agency and its own NOTE structure), ABBR, TEXT, multiple REFNs (each with a TYPE to describe it), RIN, a change date, a NOTE structure, and a Multimedia Link (which has its own title, a file reference, a multimedia format type, and a source media type: audio, book, card, electronic, fiche, film, magazine, manuscript, map, newspaper, photo, tombstone, video).

The source links to a Repository (REPO) record that contains the name of the repository, its address, phone number, email, fax, web page url, a note structure for the name, a REFN (with a type), a RIN and a change date.

Along with the link to the Repository is a NOTE structure, a CALN call number and source media type (same choices as above)

The conclusion data links to the source via their misnamed SOURCE_CITATION which includes PAGE, EVEN (event cited from and ROLE of the person in the event), DATA (including date the entry was recorded and TEXT from the source), a Multimedia link (as above), a NOTE structure, and a QUAY (quality assessment).

The power of the PAGE tag is often overlooked. It is to describe the specific location within the information referenced. The data in this field is in the form of a label and value pair, with each pair being separated by a comma. The labels are user defined, so anything goes. This gives this standard great flexibility. The example given in GEDCOM is: 

Film: 1234567, Frame: 344, Line: 28 

Note that this is GEDCOM 5.5.1 and includes some improvements over GEDCOM 5.5’s sourcing.   GEDCOM 5.5.1 is the de facto standard (as Tamura Jones has explained) because PAF used it and many programs followed.

I’m not saying GEDCOM’s sourcing is perfect. It is not. It does mix a bit of conclusions with sources and there are some source types that can be handled better. But it is far better than most people realize. There really is very little that cannot be reasonably described with GEDCOM’s sourcing.

The problem in my opinion was that programmers did not look into GEDCOMs sourcing deeply enough and did not attempt to use it in all its detail. Many instead invented their own non-standard schemes which results in their GEDCOM exports not transmitting their source data to other systems.

With regards to GRAMPS, I can’t believe any of the programmers have attempted to use GEDCOM sourcing to the extent it could be, or GRAMPS sourcing would be much better than you describe.

I do feel that this committee should be able to come out with some sort of system that is not much more complex than with what GEDCOM did as I described above.

(In response to another post about nested sources, I added the following)

With regards to nested sources, my opinion is that a simple reference within a source to another source will handle this easily (similar to Tom’s proposal for Personas), e.g.:
   0 @S1@ SOUR
   1 TITLE xxxxx
   1 SOUR @S2@
   2 PAGE …
   2 NOTE …

The Future of Genealogy – 6 Predictions - Tue, 7 Apr 2015

There’s been a lot happening the past few years. As I’ve developed Behold, I’ve tried to stay aware of the trends in the genealogy field and the expectations of genealogists of their genealogy software.

I was very inspired by James Tanner’s blog post yesterday titled: Expanding Our View of What is Possible in Genealogical Research. James correctly says that the old way of doing genealogy that we all did 30 years ago is gone. In other words, we no longer have to travel to the library, vital statistics office, or archives and laboriously track down all the bits of information we need to put our ancestors together piece by piece. No. Technology has fallen upon us. It allows us to sit comfortably in our house on our computers and search and find more records and connect with more people and more relatives that we ever could have imagined possible.

The world has changed. Here is my expectation of what is coming:

  1. More Interest in Genealogyimage

    Companies such as Ancestry, FamilySearch and MyHeritage have been claiming tens of millions of subscribers. I’ve heard that MyHeritage is adding thousands of new users each day. Over 20,000 people were at RootsTech in Salt Lake City this year. Who Do You Think You Are and Genealogy Roadshow are now regular programming on major networks.

    Why is this? Because technology has turned genealogy from a niche hobby for only the most studious meticulous researchers to one that can be done by anyone with an internet connection.

  2. Everything Digital

    One of the most tedious tasks 30 years ago was paper, and writing up your family information, and organizing it, and storing it.

    It’s becoming a digital world. Everything is getting scanned. It can be saved online, or shared in the cloud, and organized in folders and every word can be indexed so anything can be found.

    Genealogy software developers are learning as well that people want/need to record their assumptions and reasoning so programs are starting to make that possible and incorporate these features. The data is digitally transferred to your smartphone so you can take it with you. Your camera, scanner, social network, online browser, cloud data and genealogy tool is becoming one device that you carry around with you wherever you go.

  3. Online Data and Online Trees Ad Infinitum

    There are so many online repositories and so many online records, it is getting to the point that no one person has enough time in their lifetime to research all there is about their family.

    The online services now give you smart matches or similarly-named tools that match your data to potential family trees or records that may or may not be pertinent to you. You can easily get 10,000 of these “hints” thrust upon you. If you take only 10 minutes to thoroughly review, assess and if necessary incorporate the results of each smart link into your research, that will only take 2,000 hours of your time. By then, you’ll likely have 20,000 new links to check.

    This is obviously unmanageable and cannot persist. It means that new tools will be coming to identify and make the dissemination of this information easier. (I’m thinking deeply about this)

  4. Down with Standards. Up with APIs

    I’ve been a supporter for years of both the BetterGEDCOM and FHISO initiatives for a new genealogy data communication standard. But I’m now feeling the effort will not get anywhere unless it completely changes its emphasis.

    We don’t want to transfer just data anymore. We want to connect the information available at the online repositories and online services to what we have and make corrections, add conclusions and connect the conclusions to their evidence. In other words, we want our data AND our reasoning AND the evidence behind our reasoning to transfer and connect seamlessly with the online resources.

    I really think AncestorSync had the right idea. Connect to everything. Use the Application Programming Interfaces (APIs) of each company to connect your data to theirs. Hide the details of the API from the user and make it seemless to the user. It should just work like magic. Unfortunately, the implementation of that idea was much harder than the even the very smart people at AncestorSync thought, and the effort was abandoned.

    But it’s starting again. RootsMagic is connecting to MyHeritage and FamilySearch. FamilySearch has partner sites who interact with its data. And other sites are building public APIs as well.

    Once there is a company big enough that connects to everywhere by linking to all these APIs, it will becomes hugely popular, and the genealogical world will take another giant leap.

  5. My Data / My Research

    The concept of one world tree is fine. The concept of individual linked trees is also fine. Each has advantages and disadvantages.

    But everyone wants/needs to separate out the data that they “know” is true from all the other stuff/junk/conjecture/miscellanea that Joe Blow has put up on some online site. We want to know exactly what we have personally examined and verified and concluded.

    So there is still an extreme need for personal genealogy data. The best place for that is still and will always be on your own personal computing device so you can ensure that no one else will update or tamper and destroy what you worked so hard to produce. So desktop software is not dead and will never be (at least until the desktop itself dies).

    All you’ll need is that magic API program from #4 and you’ll be set.

  6. Genetics and DNA

    The elephant is in the room. The technological advances that made DNA testing affordable to the masses in the past 10 years has taken the world by storm. Millions of people have been tested at several different testing companies and a whole new science of genealogy has been born.

    It is really unbelievable what you can do with DNA results when a company has a million other tested people you can compare with. Genealogists are in the “still trying to figure this out” phase, but it’s really simple when you think about it.

    You have two genealogies. There’s your traditional genealogy of whom you, your relatives and the records think your ancestors were. Then there’s your genetic genealogy that says who your genetic ancestors were. These two genealogies are not the same. They may not even be close. The rate of genetic NPE (Not the Parent Expected) has been estimated at between 1% and 3% or higher. By the 6th generation, half the ancestors in your genetic tree might not be who you thought they should be.

    Genealogy will, by necessity, evolve so that people realize they have multiple ancestries, and will want to trace both their traditional family and their genetic family. People have to smarten up first and realize that there’s a reason why your grandfather does not have a DNA match with you. So don’t promote DNA research through your family until you are absolutely sure no one will get hurt by it!

    But this DNA thing is phenomenal. Take it. Embrace it. Use it if you dare.

Why Completing the Programming of Something New Always Takes Longer than Planned - Sat, 7 Mar 2015

It’s that unexpected exceptional case that you (of course) never expected, but then happens.

I’ve been working hard the last couple of months to finish Version 1.1 of Behold. I’m so excited about it and I’ve effectively finished putting in everything I want to put in. I’ve been running through my test cases to ensure it all works it should.

My counter on my phone says that today is the 19th consecutive day in a row that I’ve worked on Behold since back from holidays. For 3 weekends now, including one before I left on holidays, I was expecting to announce that this version was ready. That’s 3 weekends now that one or more of these unexpected exceptions has occurred that has prevented the release. This latest one, which took me 5 days to resolve, is an excellent example. With it you’ll see some of the great information that Behold’s Everything Report will give you in Version 1.1.

While testing my own GEDCOM file, I came across this:


This is my father’s stepfather (who I’m named after) who’s first wife died and, as was traditional in those days in the farming communities, widowers needed a wife, and he was quickly matched up to my father’s mother Goldie whose husband passed away 4 years earlier. They married less than 5 months after Louis’ wife passed away.

So what’s wrong here. The first thing I saw above that was wrong was that at Louis’ death, it is saying that he was widowed 11 years. That is not correct. He had married his 2nd wife and was married 11 years, not widowed, when he died.

The second thing I see wrong here is that Sarah is marked as Wife 2 and Goldie as Wife 1. They are shown in the correct order but are numbered wrong.

The third thing wrong was that Sarah’s death was not shown.

It took 3 days inspecting and debugging my code before I thought of looking in the GEDCOM file that contained the data. Sure enough, Louis’ two FAMS records that represent his marriages are listed in the incorrect order of Goldie first and Sarah second. I already had included a check in Behold to ensure that marriages in the GEDCOM file are in correct marriage date order. Most genealogy programs do output these records correctly, but Behold will issue a warning if they don’t. And then Behold will fix the order.

I had a puzzle.

Why didn’t the order get fixed? Well, the marriage date with Sarah was unknown. So Behold couldn’t fix the order.

Why then was this still a problem? It was because the spouse of the last marriage had died, so it displayed Louis as a widower.

Why was wife 2 listed first? Because there was no marriage date and I had a slight bug that didn’t order the marriages correctly in that case.

Why didn’t Sarah’s death show up. That’s because if Sarah was married second, then it was after Goldie’s marriage. This means that Sarah’s death was before Goldie’s marriage which was before Sarah’s marriage and I put smarts into Behold so it wouldn’t display your spouse’s events prior to your marriage with your spouse – since he/she wasn’t in your life yet.

And the fact that Louis was widowed for 3 months and that Goldie was widowed for 4 years when they married is important. This needed to be added.

Once I understood the problems (that was the hard part) the rest was easy to fix (if you think that 8 hours of programming work is easy). While sorting each person’s marriages, if a marriage date was not given, then I had to check the spouse’s death date and ensure that it wasn’t before the date of any of the previous marriages. If so, then this marriage must have happened before the previous marriage and the order will need to be switched.

I’m not sure anyone will have been able to follow all this, because my head’s spinning just from writing it.

With the work done, final results were pleasing. I got this:


Now this is no longer just your standard birth/marriage/death information. It tells a real story and puts everything in context:  Louis was born in 1878 in Russia. He first married Sarah (date of marriage unknown or it would be shown). Sarah died when Louis was 50 to 51 and when Sarah was 51. Louis then married when he was still 50 to 51 after being widowed for 3 months. His 2nd wife was Goldie who was 36 and was herself widowed 4 years. Louis died at the age of 61 to 62 after being married to Goldie for 11 years.

I have designed this information to be extremely useful for anyone trying to do family research and understand the lives of their family members. Context is everything, and I’m working to produce an Everything Report that will supply you all the context you will want and need.