Unexpected BNF - Mon, 2 May 2016
Just before I released the last version of Behold a month ago, a user asked if there was a way to display all the burials and nothing else. They wanted to check out all the cemeteries in a city they were going to.
I thought Behold could do that already. In the Organize Tags page, I allow selection of which tags you want to display. I thought it should be possible because you can select the BURI tag and unselect everything else, like this:
But it didn’t quite work right. First, subordinate tags to the BURI tag would be hidden, so you’d have to find them and select them again. Doing so would sometimes make some unwanted data show up elsewhere.
Also, I found the tag hiding was not working in the Place details and the Source details. And that’s where it was needed most. If it would have worked, it would have produced very useful information in Behold’s Everything Report that would have looked like the following screen shots.
First, the person info would still show everybody, but only their burial facts:
Then for places, only those places having burials would display:
Then the sources would only show those that were claimed for burials:
Having this feature would allow not just a customized listing of burials, but would allow lists of any desired fact or combination of facts. For example, maybe you had access to some school service or military service records and you want a concise listing of those to refer to.
So I decided to include the ability to hide or display facts as part of the (hopefully) final set of changes to the Everything Report prior to implementing saving and editing.
When I started implementing this a couple of weeks ago, I immediately realized that specification of the tags to include and exclude did not work at the tag level. I needed to make that specification work at the fact level. That way, each fact type (e.g. birth, education, marriage, death, burial, etc.) could be shown or hidden as desired, and the same can be done for those fact types in the Place Details and Source Details. Then I’d add a checkbox to deselect (or select) all facts so that it would be easy to select the 1 or 2 facts that you wanted shown (or hidden).
I got that working and it didn’t do too bad. It produced the above screen shots when burials were selected.
I was able to do this fairly simply by extending my tag definitions in my Delphi code. My previous code that only allowed the selection of tags looked like this:
This code set up each tag, the text that would display, whether this tag was hidden by default or not, and which versions of GEDCOM this tag was valid in.
Some of the tags, especially in the HEAD record were shown with the record name and the tag name using a period to separate them, e.g.:
Now I’d be separating out the facts from these tags so that only the fact tags could be selected. I decided I could identify these if I prefixed them with the record tag they came from, which would only be INDI (individual) or FAM (family) records. So now the facts section of this code started to look like this:
The tricky part of all this was picking out which of the tags were facts one level under the INDI and FAM record. There are some tags, such as SOUR and NOTE and OBJE that can be both at level 1 describing the person, but also can be at level 2 describing a fact. There were also some odd things, such as CHAN, the record change date tag, the ID number, and additional information Behold supplies at the fact level that needed to be handled so that only the desired facts would display and nothing else.
So to rigorously ensure I had all the facts, I had to go back to the GEDCOM 5.5.1 standard and work my way through it and pick out all the fact tags, and all the detail tags and the ones that could be used as either a fact or a detail.
After a couple of successively improving attempts at this … Eureka! I realized something. I had in place the structure I needed to do an automated and effectively perfect parsing of a GEDCOM file. The GEDCOM standard is constructed using a grammar known as Backus-Naur Form or BNF. It defines what constructs are allowed. An excerpt of it from GEDCOM looks like this:
The items in double angle brackets are subordinate structures, e.g.:
and the items in single angle brackets are the data tags, e.g.:
Unexpectedly, I had figured out how to simply enter the GEDCOM BNF notation into my code, and Behold will do the checking to ensure that the input conforms to the GEDCOM standard. My code will now look like:
What this does is allow me to just about take the GEDCOM BNF and copy it directly into my own code. There is no translation or mapping I need to do so it is relatively painless and less error-prone.
I will add extra parameter calls to my SetDefaultTag routine for the minimum and maximum number of occurrences of each construct and the minimum and maximum size of the data values, which will allow my routine (to be renamed to LoadBNF) to automatically check those limits and issue a message if the GEDCOM is not proper. The really nice thing is that I’ll have the actual GEDCOM structure name coded, so for any error messages Behold will be able to display the exact structure name the error is in, e.g.:
<<NAME_PERSONAL>> more than 120 characters
I was planning to implement complete GEDCOM checking in a few months from now once I started working on GEDCOM 5.5.1 output. I had looked previously at Delphi implementations of BNF, and was not looking forward to the task of either adapting them or writing my own. Believe me, writing a grammar parser is not fun. That’s likely because it’s something they make you do in 2nd year Computer Science classes as a lab assignment. So I’m very pleased. I never expected the methodology to do this would just fall in place so conveniently.
The beauty of this coding structure is that I’ll be able to go through previous GEDCOM versions (5.5, 5.3, 5.0, 5.5EL, FTM Text) quite quickly and incorporate full checks of those structures as well.
This will also extend to similarly structured grammars that are also made up of hierarchies of tags and values. This includes JSON Schema, which means I can use this for reading Behold’s own file format when I develop that along with editing. I’d likely also be able to quickly develop the input routines for FamilySearch’s GEDCOM X when the time comes to do so.
This is what programming is about and what makes it so much fun. You build up a structure and methodology in small steps and it evolves into something that you never expected and amazes you.
If you got to this point in this post, thanks for lasting through all this technical jabber. I had to spout this off. Now I feel better.
I’m working hard. Lots of great things to come.