Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Unexpected BNF - Mon, 2 May 2016

Just before I released the last version of Behold a month ago, a user asked if there was a way to display all the burials and nothing else. They wanted to check out all the cemeteries in a city they were going to.

I thought Behold could do that already. In the Organize Tags page, I allow selection of which tags you want to display. I thought it should be possible because you can select the BURI tag and unselect everything else, like this:

image

But it didn’t quite work right. First, subordinate tags to the BURI tag would be hidden, so you’d have to find them and select them again. Doing so would sometimes make some unwanted data show up elsewhere.

Also, I found the tag hiding was not working in the Place details and the Source details. And that’s where it was needed most. If it would have worked, it would have produced very useful information in Behold’s Everything Report that would have looked like the following screen shots.

First, the person info would still show everybody, but only their burial facts:

image

Then for places, only those places having burials would display:

image

Then the sources would only show those that were claimed for burials:

image

Having this feature would allow not just a customized listing of burials, but would allow lists of any desired fact or combination of facts. For example, maybe you had access to some school service or military service records and you want a concise listing of those to refer to.

So I decided to include the ability to hide or display facts as part of the (hopefully) final set of changes to the Everything Report prior to implementing saving and editing.

When I started implementing this a couple of weeks ago, I immediately realized that specification of the tags to include and exclude did not work at the tag level. I needed to make that specification work at the fact level. That way, each fact type (e.g. birth, education, marriage, death, burial, etc.) could be shown or hidden as desired, and the same can be done for those fact types in the Place Details and Source Details. Then I’d add a checkbox to deselect (or select) all facts so that it would be easy to select the 1 or 2 facts that you wanted shown (or hidden).

I got that working and it didn’t do too bad. It produced the above screen shots when burials were selected.

I was able to do this fairly simply by extending my tag definitions in my Delphi code. My previous code that only allowed the selection of tags looked like this:

image

This code set up each tag, the text that would display, whether this tag was hidden by default or not, and which versions of GEDCOM this tag was valid in.

Some of the tags, especially in the HEAD record were shown with the record name and the tag name using a period to separate them, e.g.:

image

Now I’d be separating out the facts from these tags so that only the fact tags could be selected. I decided I could identify these if I prefixed them with the record tag they came from, which would only be INDI (individual) or FAM (family) records.  So now the facts section of this code started to look like this:

image

The tricky part of all this was picking out which of the tags were facts one level under the INDI and FAM record. There are some tags, such as SOUR and NOTE and OBJE that can be both at level 1 describing the person, but also can be at level 2 describing a fact. There were also some odd things, such as CHAN, the record change date tag, the ID number, and additional information Behold supplies at the fact level that needed to be handled so that only the desired facts would display and nothing else.

So to rigorously ensure I had all the facts, I had to go back to the GEDCOM 5.5.1 standard and work my way through it and pick out all the fact tags, and all the detail tags and the ones that could be used as either a fact or a detail.

After a couple of successively improving attempts at this … Eureka!  I realized something. I had in place the structure I needed to do an automated and effectively perfect parsing of a GEDCOM file. The GEDCOM standard is constructed using a grammar known as Backus-Naur Form or BNF. It defines what constructs are allowed. An excerpt of it from GEDCOM looks like this:

image

The items in double angle brackets are subordinate structures, e.g.:

image

and the items in single angle brackets are the data tags, e.g.:

image

Unexpectedly, I had figured out how to simply enter the GEDCOM BNF notation into my code, and Behold will do the checking to ensure that the input conforms to the GEDCOM standard.  My code will now look like:

image

image

What this does is allow me to just about take the GEDCOM BNF and copy it directly into my own code. There is no translation or mapping I need to do so it is relatively painless and less error-prone.

I will add extra parameter calls to my SetDefaultTag routine for the minimum and maximum number of occurrences of each construct and the minimum and maximum size of the data values, which will allow my routine (to be renamed to LoadBNF) to automatically check those limits and issue a message if the GEDCOM is not proper. The really nice thing is that I’ll have the actual GEDCOM structure name coded, so for any error messages Behold will be able to display the exact structure name the error is in, e.g.:

<<NAME_PERSONAL>> more than 120 characters

I was planning to implement complete GEDCOM checking in a few months from now once I started working on GEDCOM 5.5.1 output. I had looked previously at Delphi implementations of BNF, and was not looking forward to the task of either adapting them or writing my own. Believe me, writing a grammar parser is not fun. That’s likely because it’s something they make you do in 2nd year Computer Science classes as a lab assignment. So I’m very pleased. I never expected the methodology to do this would just fall in place so conveniently.

The beauty of this coding structure is that I’ll be able to go through previous GEDCOM versions (5.5, 5.3, 5.0, 5.5EL, FTM Text) quite quickly and incorporate full checks of those structures as well.

This will also extend to similarly structured grammars that are also made up of hierarchies of tags and values. This includes JSON Schema, which means I can use this for reading Behold’s own file format when I develop that along with editing. I’d likely also be able to quickly develop the input routines for FamilySearch’s GEDCOM X when the time comes to do so.

This is what programming is about and what makes it so much fun. You build up a structure and methodology in small steps and it evolves into something that you never expected and amazes you.

If you got to this point in this post, thanks for lasting through all this technical jabber. I had to spout this off. Now I feel better.

I’m working hard. Lots of great things to come.

SQLite for Genealogy Software - Sat, 23 Apr 2016

An article by Keith Riggle yesterday: “Where Are the Free Family Tree Maker Updates” caught my attention. Keith indicated that the Family Tree Maker database was based on the SQLite database software. That surprised me.

SQLite is a multi-platform, speedy, single file database with a small footprint that’s embeddable within the executable of a program. It uses standard relational constructs and the universally used SQL (Structured Query Language) to access and update its data. It is open source and has a very large support community. It is not likely to become a database that will become unsupported anytime soon.

I had known for a long while that RootsMagic uses SQLite. They benefit from a small techno user community who are developing addons, and look at the great stuff they’re doing. RootsMagic has let the group do what it is doing, but does not appear to have been helping them. If they were, they would have supplied the database structure definitions to them. Instead, at least to me, it appears that the SQLite Tools for RM group has mostly reverse engineered what the codes in the database seem to mean.

My Heritage recently rewrote Family Tree Builder using SQLite, which is used in their new version 8.0. Tamura Jones recently reported on the new FTB technology. And I understand that Gramps is considering SQLite for their Version 5.0 release.

I commented to Keith asking if he’s ever tried viewing the FTM database with an SQLite browser.  (I use the free tool SQLiteSpy from Delphi Inspiration). Keith replied back that the FTM SQLite database is encrypted, which basically means it is protected from being read except by authorized programs that know the encryption key.

image

Jack’s reason was for “the security of user data”. Keith wondered about that in his comment and noted that GEDCOM has the same data and is just as insecure.

I think Jack may have been referring to the security of the database itself. If the database is encrypted, then nobody can use an SQLite tool to add something to it, delete something from it, or corrupt the database in some way.

But I personally think that is a mistake on the part of the developer. RootsMagic to me does it right and leaves their SQLite database open. The SQLiteSpy tool and many others can view and read all the data in the database. You can write and update to the database yourself with an SQL tool if you know what you’re doing. You get a community of people who can feed off your database and write utility programs and enhancements. And I’ve not yet seen any complaints from anyone that the RootsMagic SQLite database is open.

Family Tree Builder 8.0, like RootsMagic, also did not encrypt their new SQLite database. They have an open API for their MyHeritage site as well and encourage developers to support their systems. And I’m sure that Gramps won’t encrypt their database. They’ll want their programming community to interact with it.

This is what a database looks like with a program such as SQLiteSpy:
image

When access is given to a program’s data, tremendous things can be done by the user community. Desktop programs can provide plugins. Family Historian does this and allows users with programming ability to write Family Historian plugins which they make freely available.

Online databases typically do this through Application Programming Interfaces (APIs) through which programmers can access and even modify the data where allowed. FamilySearch has scores of different programs that access their Family Tree database, including full featured programs like RootsMagic, Ancestral Quest, Family Tree Maker and Legacy. MyHeritage, Geni and others also have an API which it makes freely available.

These companies see this as a win-win situation. More developers can develop addons for their system. And more people will access their system.

Ancestry does not. They too have an API, but it is private. Currently only Family Tree Maker and RootsMagic have been given access to it.

So I wonder why the hesitation in giving access to the database itself by Family Tree Maker and RootsMagic? Why the need for encryption?

I see one last really good reason for a company to open up its database structure.  If they’ve got a really good structure, then maybe others will copy it. If others copy it, then maybe it will become the standard. If it becomes the standard then they are the leaders. Just as FamilySearch was with GEDCOM.

Or even if it doesn’t become the standard, if the database is open, developers can write programs to directly transfer from one database to another without the data loss usually incurred through GEDCOM. This seamless sharing of data with other programs and online trees is something all genealogists want to see.


Followup:  Arb pointed out to me on Twitter that MacFamilyTree also uses SQLite and does not encrypt it. Here’s an example of a wonderful way the database was accessed for a Geographical mapping project.

MobileFamilyTree employs exactly the same SQLite database structure as MacFamilyTree, meaning people can use either program with the same database. Now isn’t that a wonderful concept?

Also in discussion with Arb, I stated that developers should not be afraid of opening up their databases. They may think it will make it easier for their users to move away from their product. But that’s wrong thinking. What it really will do is reassure their users that their data is not trapped within the product and that they won’t lose their data should the product become unsupported and stop working. So it will give them more reason to stay with the product.

Not so Simple Mail Transfer Protocol - Mon, 28 Mar 2016

@Sparkpost to the rescue. As I recently mentioned, it’s been over 4 years since my last Behold Newsletter. This was something that I really had intended to get going again.

In 2010, I was having trouble with the mailing service that I had been using, and I started looking for a new one. Back then I selected LuxSci. They worked well for a while, but then when the Netfirms site moved, they had some glitches and they couldn’t handle bounces any more, so I had to pay for and set up an email account at LuxSci simply to accept the bounce emails so that I could process them and remove the bad addresses from my list.

For another year or so, I continued to use that SMTP service. I was paying a monthly fee whether I used it or not, plus the monthly fee for the bounce email address. But that I could have lived with.

The real problem was the sending of the newsletters. About 300 or so would go out fine in batches of 60 and then it would stall. I’d have to manually restart it, and maybe another 400 would go. Then again, and again. I remember it took me all night once to get all the newsletters out, and I had to babysit it. That was no good.

I spent a couple of weeks going through the phpmailer code. Ugh. Did I ever tell you I really really really hate the PHP language. Well, I do. And I spent days on it and tested it 100 times and couldn’t get it to work any better. I couldn’t spend any more time on this, so I let it lay dormant for a while.

Over the next couple of years, I looked for another solution, and thought I had found one. It was called Mandrill. It was recommended to me by someone technical I know who was using it. It was more than just an SMTP service, but had a programmers interface built into it. The sweet thing was the pricing. “The first 12,000 emails per month are always free.” After that, it was 20 cents per thousand. How could I go wrong? I was ready to go with it.

But I never quite got to that point. I spent the last couple of years developing Versions 1.1. and 1.2 of Behold without the opportunity to breathe and set up the new system for the newsletter. When I started working on version 1.2.1 about 4 months ago, I said to myself that I’m definitely going to start the newsletter up again after I release this version. And then I’d implement Mandrill.

About 2 months ago Mandrill suddenly changed its pricing structure. They went to $9.95 a month for 25,000 emails. That was still bearable. But shortly after they changed again and started selling in blocks of 25,000 emails for $20 a block. And with that, they decided they’d use Mandrill to promote their parent service MailChimp and now required a subscription to the MailChimp service at $20 a month to use Mandrill. Okay, now I’m completely done over with this company before I even got started. Others were upset as well: ”In hostile move, Mandrill gives all developers 60 days to switch to paid Mailchimp service”.

image

So it was time to look for something else. There were a few companies trying to take advantage of Mandrill’s faux pas. The one I noticed first was Pepipost. Note the bar at the top of the screenshot of their page which says: “Coming from Mandrill? Meet the Free Alternative”:

image

That sounded cool. 25,000 free emails a month. So I tweeted:image

Shortly after I tweeted that, I saw this tweet: (**Note: Before reading too much into this tweet, see the comment by Sachin of Pepipost as well as my response)

image

So I took a look at Mailgun and Sparkpost. Sparkpost had an unbelievable offer:

image

I went down there, found Sparkpost is the SMTP service for programmers using the same fantastic platform that their parent company Message Systems uses. And they’re serious about helping developers as well.

Okay. Did it work? Well, it took me a couple of hours to set everything up. What I didn’t know was whether the 300 to 400 email jam-up 4 years ago was something on my end (Netfirms or phpmailer) or if it was the fault of LuxSci. So then the big test. I sent them out and help my breath. And wow! They were going out, uninterrupted at a rate of 3,500 per hour! No more staying up all night and babysitting. It was simple and it worked.

This was using SMTP to send out each email individually exactly as I had been doing it previously with LuxSci, but without the glitches. This should do me well until I get close to the 100,000 free limit. When I do, I’ll likely spend some time to use Sparkpost’s Application Programming Interface (API) tools and send the newsletter to them just once along with a list of the recipients which should speed things up even more. I’m not yet sending the bounces back to my website to automatically handle them, and I will set that up for a future mailing. For now I just manually downloaded them to a file.

So 4 years went by and I had about 10% bounces (i.e. people on my list who changed email addresses) that I’ve now deleted from my Newsletter list. This agrees with the natural decay rate of about 2.5% per year that I’ve observed in the past.

Hopefully Sparkpost will serve me well up to 100,000 mailings a month and beyond.