Login to participate
Register   Lost ID/password?
Louis Kessler's Behold Blog » Blog Entry           prev Prev   Next next

Is GEDCOM Good For Sources? - Thu, 7 May 2015

My interest was tweaked earlier today by a discussion between Tony Proctor and Nick Hall on the mailing list for the FHISO Sources-Citations Exploratory Group. In particular, Nick made the following statement:

The problem with GEDCOM is that it heavily restricts the types of source that can be easily cited.  Citations in Gramps are based on GEDCOM, and this is an area that needs improvement.  I suppose one good thing about GEDCOM is that it doesn’t specify formatting - it just allows the transfer of elements such as TITL, AUTH, PUBL and PAGE.

I had to respond, and I think I’d like to document my response and my opinion about GEDCOM and sourcing here on my blog. This is what I said:

I’d like to defend GEDCOM for a moment. Its source structure is much more flexible than you state.

In the SOUR record, it provides TITL, AUTH, PUBL, and also DATA (with an AGNC - responsible agency and its own NOTE structure), ABBR, TEXT, multiple REFNs (each with a TYPE to describe it), RIN, a change date, a NOTE structure, and a Multimedia Link (which has its own title, a file reference, a multimedia format type, and a source media type: audio, book, card, electronic, fiche, film, magazine, manuscript, map, newspaper, photo, tombstone, video).

The source links to a Repository (REPO) record that contains the name of the repository, its address, phone number, email, fax, web page url, a note structure for the name, a REFN (with a type), a RIN and a change date.

Along with the link to the Repository is a NOTE structure, a CALN call number and source media type (same choices as above)

The conclusion data links to the source via their misnamed SOURCE_CITATION which includes PAGE, EVEN (event cited from and ROLE of the person in the event), DATA (including date the entry was recorded and TEXT from the source), a Multimedia link (as above), a NOTE structure, and a QUAY (quality assessment).

The power of the PAGE tag is often overlooked. It is to describe the specific location within the information referenced. The data in this field is in the form of a label and value pair, with each pair being separated by a comma. The labels are user defined, so anything goes. This gives this standard great flexibility. The example given in GEDCOM is: 

Film: 1234567, Frame: 344, Line: 28 

Note that this is GEDCOM 5.5.1 and includes some improvements over GEDCOM 5.5’s sourcing.   GEDCOM 5.5.1 is the de facto standard (as Tamura Jones has explained) because PAF used it and many programs followed.

I’m not saying GEDCOM’s sourcing is perfect. It is not. It does mix a bit of conclusions with sources and there are some source types that can be handled better. But it is far better than most people realize. There really is very little that cannot be reasonably described with GEDCOM’s sourcing.

The problem in my opinion was that programmers did not look into GEDCOMs sourcing deeply enough and did not attempt to use it in all its detail. Many instead invented their own non-standard schemes which results in their GEDCOM exports not transmitting their source data to other systems.

With regards to GRAMPS, I can’t believe any of the programmers have attempted to use GEDCOM sourcing to the extent it could be, or GRAMPS sourcing would be much better than you describe.

I do feel that this committee should be able to come out with some sort of system that is not much more complex than with what GEDCOM did as I described above.

(In response to another post about nested sources, I added the following)

With regards to nested sources, my opinion is that a simple reference within a source to another source will handle this easily (similar to Tom’s proposal for Personas), e.g.:
   0 @S1@ SOUR
   1 TITLE xxxxx
   1 SOUR @S2@
   2 PAGE …
   2 NOTE …

6 Comments           comments Leave a Comment

1. Enno Borgsteede (ennoborg)
Netherlands flag
Joined: Wed, 9 May 2012
15 blog comments, 0 forum posts
Posted: Fri, 15 May 2015  Permalink

Louis, yes, it can be done, but I prefer for Film: 1234567, Frame: 344, Line: 28 to appear in separate GEDCOM tags, like FILM, PAGE, and LINE, because that makes things less dependent on language, and more straight-forward to parse, IMO.

Sources referencing other sources with a similar construct to how sources are already referenced by GEDCOM citations is a logical extension too. I’d like to extend this with a reference type, to distinguish between inclusion, like an article in a magazine, or on a blog, and extraction, like an indexed record from a film. Film details, like those of the chruch book it was made from, can then be stored in the referenced (indexed) source.

2. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
237 blog comments, 226 forum posts
Posted: Sat, 16 May 2015  Permalink


I fine with any simple method as you suggest, but we don’t want a standard that is a huge elaborate complicated mess.

The point is that GEDCOM may not be the best, but it is able to handle sources despite the statements of many people that it cannot.


3. Enno Borgsteede (ennoborg)
Netherlands flag
Joined: Wed, 9 May 2012
15 blog comments, 0 forum posts
Posted: Mon, 18 May 2015  Permalink


That’s right. You can take elements from A2A, or Dublin Core, whatever you like, but things can get easily out of hand if you take too many. When you need more than a dozen, I think something’s wrong.


4. arnold (arnold)
Canada flag
Joined: Mon, 24 Nov 2014
10 blog comments, 13 forum posts
Posted: Thu, 21 May 2015  Permalink

I very much agree that support for sources have been reasonably well covered in GEDCOM - especially, considering when the standard was ‘published’.
The (very evident) problem is that the people who write/wrote the software and most of those who used it, were really only name collectors and either, never really understood the need for ’source documentation’, did not care or were rushed by their bosses to ‘forget about’ it.
I know, I was ’spoiled’ in this regard by PAF and though I have moved on from it, none of the software I am using or have used, _insists_ on sources nor makes it as easy as possible to use and cross-reference, even if it only happens to be a note such as ‘grandma so-and-so’s recollections on her 90th birthday as retold by her niece’s daughter Edna at the last family reunion in 1998 at someplace’

Now, I find myself having to go back and track down all of the names in my ‘tree’ and try to recall and cross-check where the data came from and every so often call down the pox on those who should have known better. :-)
and let me get away without sources.

5. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
237 blog comments, 226 forum posts
Posted: Fri, 22 May 2015  Permalink


In the earlier days, GEDCOM’s sources weren’t as well developed. I happen to have a small booklet called “PAF Documentation Guidelines, 1995 Edition”, produced by the Silicon Valley PAF Users Group. They didn’t think PAF’s ability to document sources was good enough. They went a step further and wrote this 48 page booklet.

What they did was use NOTE entries and identified these by putting an exclamation mark at the beginning of the note. An example from the booklet is:

!BIRTH: Letter to Sarah Smith, 1969; from her Aunt Ida Reinhold dtd 14 Apr 1969; ; In poss of Bill Smith, San Jose CA; QUOTE: “our father, Austin Vallem was born in Aardal, Norway near Stavanger, on Dec 13th 1857…”

This contains keywords (BIRTH, QUOTE) and structure (separated by semicolons) that are defined in the booklet.

In the early days of developing Behold, I was making plans to support this format, because it was then in use a lot and it was easy to find many GEDCOMs containing these structures. Now there are few left as the genealogy software has got somewhat better at allowing source information to be entered directly.

I have to give credit to the Silicon Valley PAF Users Group for the excellent work they did to hammer out that standard. If any of them are still around, they’d be good candidates to help FHISO develop a new standard.


6. arnold (arnold)
Canada flag
Joined: Mon, 24 Nov 2014
10 blog comments, 13 forum posts
Posted: Sun, 24 May 2015  Permalink

Looks like the book is still available, though I don’t know how specific it is to PAF, which I no longer use actively.
The group still seems to be active - judging by their calendar of meetings and events. See: http://svpafug.org/index.html
By now I have a better idea of what I should do, but am often frustrated by what some specific program will let me do and how easy or difficult it makes it to be thorough and detailed enough.
The other problem I have is that there may be times when there is quite a time lapse between chances I get to work on a specific person and the need for me to refresh my memory concerning what I already have and what I have learned in the mean time. Thus, being able to review these sources in context is equally important.


The Following 1 Site Has Linked Here

  1. On GEDCOM, its uses and abuses... | Geneatools | Amos Bannister : Sat, 9 May 2015
    "Louis Kessler recently posted about an interesting exchange about GEDCOM's handling of sources on the FHISO mailing list. Louis raises some interesting points and shows that ..."

Leave a Comment

You must login to comment.

Login to participate
Register   Lost ID/password?