Login to participate
  
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Literally Nothing From RootsMagic - Sat, 11 Feb 2012

There I was at RootsTech sitting on a panel with Bruce Buzbee of RootsMagic as we were both panelists for the Sources, Citations, Metadata session. Little did I know the pain his RootsMagic program would soon be causing me.

That same day, Bart Brenner, one of the four Behold users I met in person at the conference, told me he had a RootsMagic GEDCOM that Behold didn’t read properly. He told me he did this:

  1. I had a 7000 person RootsMagic 5 file which, exported to GEDCOM, loaded into Behold just fine.
  2. I pruned that file down to my straight line ancestors (plus their children) back 4-7 generations. This file contains 700 people.
  3. I exported the pruned RM5 file to GEDCOM
  4. When this file loads into Behold, it would appear to contain only 11 people. No one seems to be connected to me.
  5. I imported this GEDCOM back into RM5 and all 700 people “re-appeared” with their proper family connections.

Bart provided me with the GEDCOM so I could see what was going on. I checked, and sure enough, Behold only seemed to read the first 11 people and then stop. It was either a bug in Behold, or something in the file.

On the plane ride home from the conference, I tried to see what the problem was. I found this interesting result: If I load the GEDCOM in Notepad, and then save it from Notepad to a new file, and I then load that saved file in Behold, Behold loaded it fine. Aha! Now all I had to do was compare the two files for differences.

What did I find? I found 477 occurrence of the Hex character 00 in the file. They were spread throughout the notes, either on the CONT (continuation) or on the CONC (concatenation) lines. When saved in Notepad, they get turned into spaces (Hex 20) so that is why they don’t cause a problem.

So what is this Hex 00 character. It has the value zero. It is nothing. It doesn’t represent any character. But it performs a very important function in many programming languages, including Delphi. It signifies the end of a string. And when Behold was reading the GEDCOM file and reached the Hex 00, it assumed it was the physical end of the file and stopped reading.

I have tested many hundreds of GEDCOMs and this had never occurred before. The Hex 00 was always at the end of the file. But now this literal nothing from RootsMagic is forcing me to change a core function within Behold. But how to do it?

I knew that if Notepad could read the whole file, that there must be a way to load the file without using the Hex 00 as the endpoint. It turned out to be simple. When I load the file using Windows functions, I can get the file size.

But now the hard part. Behold still parses through the file using strings. And these strings would be interrupted by this Hex 00 character. This code needs to be fast. I had some ideas, and asked a question on StackOverflow getting a few answers within an hour and confirming my thinking. I had to simply parse through character by character and convert any Hex 00’s that are not at the end of the file, to some other character (I chose a period, because that’s often used to display non-printable characters) and then keep going.

This was not a simple fix, but was major surgery. The patient (Behold) is doing well and will make a full recovery. I also have refactored Behold’s GEDCOM parser readying it for the legal GEDCOM export which I’ll soon be starting to work on.

So this fix will be included in the next point release of Behold. I wonder how bad a problem this is and if there are very many other programs out there that might choke on the “nothing” in the RootsMagic GEDCOM.

9 Comments           comments Leave a Comment

1. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Mon, 13 Feb 2012  Permalink

You state: ‘This code needs to be fast’.

But what actual effect to Behold load time will this do for large GEDCOM files?

2. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Mon, 13 Feb 2012  Permalink

Your comment ‘I wonder how bad a problem this is and if there are very many other programs out there that might choke on the “nothing” in the RootsMagic GEDCOM’ made me think hard about the same.

Thanks Louis (and Bart) for a copy of the GEDCOM file.

Results for some software include:

Ahnenblatt Opened file
Behold Imported 9 records only
dynastree Home Edition Opened file
Family Historian Opened file
Family Tree Legends Opened file
GedPad Pro Opened file
Genealogica Grafica Opened file
Legacy Standard Edition Did not open file
Family Tree Builder Opened file
Family Tree Maker Did not open file
Personal Historian Essentials Imported 9 records only
RootsMagic Opened file
The Complete Genealogy Builder Opened file
The Complete Genealogy Reporter Opened file
The Master Genealogist 8 Did not open file, Unicode not supported
TNG Opened file

3. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Mon, 13 Feb 2012  Permalink

Brett:

Thanks for posting this here. If I may add for the readers a few important things that you pointed out to me in your emails to me about this:

FTM 2010 could not read this file because it says it is GEDCOM Version 5.5.1 and FTM gave the message: “GEDCOM file must be version 5.5 or greater”. Hmmm. So when I produce my totally GEDCOM 5.5.1 compliant output, does that mean FTM won’t be able to read it?

Legacy wouldn’t open the file because it just hung. To me, I don’t care what the cause is. If Behold hangs on anything, it’s my top priority to fix that.

TMG 8 wouldn’t open the file because Unicode is not supported.

GedPad Pro and Genealogica Grafica did open the file, but you said they reported unlinked people/families. That would indicate that the Hex 00 prevented them from inputting the complete person records that contained the Hex 00.

Regarding the other programs that do read it, as Tamura points out in his comment in his Google+ posting, “Many applications will read it because they do not bother to properly parse GEDCOM files. One could add that most of these applications are likely to export it too.”

4. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Tue, 14 Feb 2012  Permalink

A few more results, all Opened the file, without any report.

GRAMPS
GENViewer
GENMatcher

5. arnold (arnold)
Canada flag
Joined: Mon, 24 Nov 2014
10 blog comments, 13 forum posts
Posted: Fri, 5 Jun 2015  Permalink

I’d be curious what the specified file encoding was and whether the actual encoding agreed with that :-)
Also it’d be interested if, where and how the issue was resolved - i.e. just a copy & paste problem, along with a loose GEDCOM output conversion?

6. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Sat, 6 Jun 2015  Permalink

Arnold,

It was encoded as UTF-8. But that really had nothing to do with it. An end-of-string character is an invalid GEDCOM character and is not allowed in a GEDCOM file.

The post links to my Stack Overflow question. The answer I accepted was what I used. The length function gives th correct length of the file. I then read in that many characters.

I see that I never answered Brett’s first question regarding whether or not the new version was still fast. Well, that was a long time ago, and I don’t really remember. But I do know that I wouldn’t have implemented it that way if it was noticeably slower. Reading a file character by character actually is quite fast, as long as you do not have to process each character you read.

Louis

7. arnold (arnold)
Canada flag
Joined: Mon, 24 Nov 2014
10 blog comments, 13 forum posts
Posted: Mon, 8 Jun 2015  Permalink

My apologies for not being clear enough.
My main interest was really whether you or anyone else ever found out where the problem originated.
Was it RootsMagic (editing) or export or some cut & paste by the user (and then it would be good to know what application it was cau & pasted from - presumably into RootsMagic.

8. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Mon, 8 Jun 2015  Permalink

Arnold,

Bart Brenner did trace it down. See Bart’s post on Google+ which in summary said: ” It was originally in a Family Tree Maker (I’m not sure which version) file which was exported to GEDCOM. That GEDCOM was imported into RootsMagic (probably version 4, perhaps version 3). It is also possible that I imported the file directly from Family Tree Maker into RootsMagic 4. The final GEDCOM that you both have been looking at was exported from RootsMagic 5.”

We did at the time email Bruce Buzbee and made him aware of the problem and gave him the test file. Hopefully by now he’s eliminated the possibility of pass through of the Hex 00 character from RootsMagic.

Louis

9. arnold (arnold)
Canada flag
Joined: Mon, 24 Nov 2014
10 blog comments, 13 forum posts
Posted: Tue, 9 Jun 2015  Permalink

Thank you, Louis.
It is good to know it was something rather uncommon; just the same, it needs to be handled.

Leave a Comment

You must login to comment.

Login to participate
  
Register   Lost ID/password?