There I was at RootsTech sitting on a panel with Bruce Buzbee of RootsMagic as we were both panelists for the Sources, Citations, Metadata session. Little did I know the pain his RootsMagic program would soon be causing me.
That same day, Bart Brenner, one of the four Behold users I met in person at the conference, told me he had a RootsMagic GEDCOM that Behold didn’t read properly. He told me he did this:
- I had a 7000 person RootsMagic 5 file which, exported to GEDCOM, loaded into Behold just fine.
- I pruned that file down to my straight line ancestors (plus their children) back 4-7 generations. This file contains 700 people.
- I exported the pruned RM5 file to GEDCOM
- When this file loads into Behold, it would appear to contain only 11 people. No one seems to be connected to me.
- I imported this GEDCOM back into RM5 and all 700 people “re-appeared” with their proper family connections.
Bart provided me with the GEDCOM so I could see what was going on. I checked, and sure enough, Behold only seemed to read the first 11 people and then stop. It was either a bug in Behold, or something in the file.
On the plane ride home from the conference, I tried to see what the problem was. I found this interesting result: If I load the GEDCOM in Notepad, and then save it from Notepad to a new file, and I then load that saved file in Behold, Behold loaded it fine. Aha! Now all I had to do was compare the two files for differences.
What did I find? I found 477 occurrence of the Hex character 00 in the file. They were spread throughout the notes, either on the CONT (continuation) or on the CONC (concatenation) lines. When saved in Notepad, they get turned into spaces (Hex 20) so that is why they don’t cause a problem.
So what is this Hex 00 character. It has the value zero. It is nothing. It doesn’t represent any character. But it performs a very important function in many programming languages, including Delphi. It signifies the end of a string. And when Behold was reading the GEDCOM file and reached the Hex 00, it assumed it was the physical end of the file and stopped reading.
I have tested many hundreds of GEDCOMs and this had never occurred before. The Hex 00 was always at the end of the file. But now this literal nothing from RootsMagic is forcing me to change a core function within Behold. But how to do it?
I knew that if Notepad could read the whole file, that there must be a way to load the file without using the Hex 00 as the endpoint. It turned out to be simple. When I load the file using Windows functions, I can get the file size.
But now the hard part. Behold still parses through the file using strings. And these strings would be interrupted by this Hex 00 character. This code needs to be fast. I had some ideas, and asked a question on StackOverflow getting a few answers within an hour and confirming my thinking. I had to simply parse through character by character and convert any Hex 00’s that are not at the end of the file, to some other character (I chose a period, because that’s often used to display non-printable characters) and then keep going.
This was not a simple fix, but was major surgery. The patient (Behold) is doing well and will make a full recovery. I also have refactored Behold’s GEDCOM parser readying it for the legal GEDCOM export which I’ll soon be starting to work on.
So this fix will be included in the next point release of Behold. I wonder how bad a problem this is and if there are very many other programs out there that might choke on the “nothing” in the RootsMagic GEDCOM.