Complete Genealogy Data Transfer - Mon, 8 Jun 2015
Isn’t this what every genealogist wants?
I thought the problem was that when you export your data from one program, the second one doesn’t read it all in. The sources may not transfer properly. Some data may come in as notes rather than as the event information they should be. Some data may just be ignored completely. Forget about the formatting and don’t even think that you’ll ever get back those hundreds of hours you spent carefully filling out those source citation templates.
We’ve been complaining for years that GEDCOM doesn’t transfer all the data. I’ve said before that it’s 90% the programmers and only 10% due to GEDCOM, but the reason doesn’t really matter. What matters is that it doesn’t do it.
So what’s the solution?
I thought it was very clear.
When a new genealogy data communication standard is created, it must require all compliant programs to:
- Input the data using the standard,
- Output the data using the standard, and
- Pass all input data through to the output including what it may not use or may not process so that EVERYTHING that was input (that is unchanged by the user during the run) will be output.
That number 3 is the key. The *only* way to get complete data transfer is to transfer the data completely, isn’t it?
For a moment, let me first reassure you that I am really working hard on Behold whenever I’m at my computer. But when I’m away and don’t have access to my development tools, I catch up on my other sites, including all the stuff going on at FHISO. I am interested in seeing a new genealogy data transfer standard to allow for the complete data transfer which is GEDCOM’s biggest problem. I’d like to see the effort move forward. And every so often, I just have to put my two cents in when I read an important post on the FHISO forums.
A week ago the FHISO Technical Standing Committee Coordinator, Luther Tychonievich asked an excellent question. He was asking the best way in a new genealogy data standard for a program to handle a data structure that it does not support. He gave 3 options that lose some data, and a 4th option that the program must be able to input, edit, and re-export the file, keeping the data structure intact.
I immediately saw and replied that the requirement was similar to option 4, but the program need not be able to edit the data structure. It only need to input and re-export the file. In other words, the program must “pass-through” all the data that it doesn’t use.
Wow! What a reaction. There are a lot of intelligent people, excellent programmers and deep thinkers on the FHISO mail list, and a thread started with the subject “Pass Through Requirement??”. I am not sure what it was that wasn’t clear, but there was almost complete rejection of the necessity of data pass-through.
I think what I said is important enough that I’d like to repost it here and get some opinions from the general genealogical community.
What do you think. Am I right or am I wrong?
This is what I said:
Sorry people. You can disagree, but I’m sticking by my guns. Data not processed must pass-through.
Let me reiterate my example again:
Program A sends data to Program B. Program B doesn’t understand Concept 1 that program A uses, so throws away Concept 1 data.
Program B sends data it got to Program C. Program C doesn’t understand Concept 2 that both Program A and Program B uses, so throws away Concept 2 data.
Program A now gets its original data back from Program C. All its Concept 1 and Concept 2 data is missing.
In other words, data gets lost when one program will not pass-through data that it will not handle.
This is why I see a requirement of data pass-through as necessity.
The non-transferability of data through GEDCOM is the number one complaint of GEDCOM and is really the primary reason why FHISO needs a new standard.
FHISO must write the new standard so that different concepts that not all programs will support (e.g. the information/evidence layer, GPS, citation templates, capabilities for one-name or one-place researchers, evidence analysis, etc.), must be sufficiently independent of each other in the standard so that a program that does not handle a concept can simply pass the data through. It will take some thinking to do this properly, but it can be done.
But once you allow any data to be lost, all is lost.
If data loss is allowed, then using an extreme example, a programmer might decide not to handle sources at all. They’ll do stuff with just the conclusion data and export just the conclusion data with none of the sources that were originally attached to the input data.
Yes, this program is compliant. It follows the standard for defining the data. FHISO will have to endorse it as a compliant program if data loss is allowed.
If FHISO is just creating a data definition standard, that is fine.
But FHISO is creating much more than that. FHISO is creating a data COMMUNICATION standard. See that key word. The data must be communicated between programs. Data loss does not communicate the data and is unacceptable.
Don’t take an example of html being lost by a text processor. That’s quite different. Take an example of sending your data up to Ancestry, you editing it up on Ancestry, and then you downloading it and not getting everything back, be it notes, sources, pictures, or maybe certain tags or data items that you don’t notice until its too late. Imagine wanting to move from Ancestry to FamilySearch and then later from FamilySearch to MyHeritage.
Yes, I know that there are all sorts of tricky little examples that seem to make this difficult: e.g. person is edited with unhandled data. But these are all solvable once the core idea of data pass through is accepted and designed.
Do you care if all your data transfers, or don’t you?