Is GEDCOM Good For Sources? - Thu, 7 May 2015
My interest was tweaked earlier today by a discussion between Tony Proctor and Nick Hall on the mailing list for the FHISO Sources-Citations Exploratory Group. In particular, Nick made the following statement:
The problem with GEDCOM is that it heavily restricts the types of source that can be easily cited. Citations in Gramps are based on GEDCOM, and this is an area that needs improvement. I suppose one good thing about GEDCOM is that it doesn’t specify formatting - it just allows the transfer of elements such as TITL, AUTH, PUBL and PAGE.
I had to respond, and I think I’d like to document my response and my opinion about GEDCOM and sourcing here on my blog. This is what I said:
I’d like to defend GEDCOM for a moment. Its source structure is much more flexible than you state.
In the SOUR record, it provides TITL, AUTH, PUBL, and also DATA (with an AGNC - responsible agency and its own NOTE structure), ABBR, TEXT, multiple REFNs (each with a TYPE to describe it), RIN, a change date, a NOTE structure, and a Multimedia Link (which has its own title, a file reference, a multimedia format type, and a source media type: audio, book, card, electronic, fiche, film, magazine, manuscript, map, newspaper, photo, tombstone, video).
The source links to a Repository (REPO) record that contains the name of the repository, its address, phone number, email, fax, web page url, a note structure for the name, a REFN (with a type), a RIN and a change date.
Along with the link to the Repository is a NOTE structure, a CALN call number and source media type (same choices as above)
The conclusion data links to the source via their misnamed SOURCE_CITATION which includes PAGE, EVEN (event cited from and ROLE of the person in the event), DATA (including date the entry was recorded and TEXT from the source), a Multimedia link (as above), a NOTE structure, and a QUAY (quality assessment).
The power of the PAGE tag is often overlooked. It is to describe the specific location within the information referenced. The data in this field is in the form of a label and value pair, with each pair being separated by a comma. The labels are user defined, so anything goes. This gives this standard great flexibility. The example given in GEDCOM is:
Film: 1234567, Frame: 344, Line: 28
Note that this is GEDCOM 5.5.1 and includes some improvements over GEDCOM 5.5’s sourcing. GEDCOM 5.5.1 is the de facto standard (as Tamura Jones has explained) because PAF used it and many programs followed.
I’m not saying GEDCOM’s sourcing is perfect. It is not. It does mix a bit of conclusions with sources and there are some source types that can be handled better. But it is far better than most people realize. There really is very little that cannot be reasonably described with GEDCOM’s sourcing.
The problem in my opinion was that programmers did not look into GEDCOMs sourcing deeply enough and did not attempt to use it in all its detail. Many instead invented their own non-standard schemes which results in their GEDCOM exports not transmitting their source data to other systems.
With regards to GRAMPS, I can’t believe any of the programmers have attempted to use GEDCOM sourcing to the extent it could be, or GRAMPS sourcing would be much better than you describe.
I do feel that this committee should be able to come out with some sort of system that is not much more complex than with what GEDCOM did as I described above.
(In response to another post about nested sources, I added the following)
With regards to nested sources, my opinion is that a simple reference within a source to another source will handle this easily (similar to Tom’s proposal for Personas), e.g.:
0 @S1@ SOUR
1 TITLE xxxxx
1 SOUR @S2@
2 PAGE …
2 NOTE …