Behold up to now has really not done much with the DATEs that have come in from the GEDCOM. It does look at the date value to pull out the year so it can display birth and death years in the Index of Names. And it was attempting to sort the dates in the event references in the Details sections, but in a very rudimentary manner, simply by year, month and day.
This is about to change as I now tackle the task of sorting all the events for an individual or a family, and to do so by the DATE value. This requires discovering what this DATE entails. The GEDCOM developers added much more than you’d expect.
There are actually 3 aspects of programming the DATE properly in a genealogy program:
- Input of the DATE: The DATE tag is more complicated than you’d think. It takes more than just a passing interest to properly input and understand this tag. I’ve always thought of Behold as a fairly robust GEDCOM checking program. But Behold currently inputs DATE as given and does not check it for validity. Now that I see what’s involved, I’ve decided I won’t be implementing DATE checking right now. The rules have changed between various GEDCOM versions, and ensuring that a date actually exists on the calendar is non-trivial and not free computationally, so true validity checking on the input side is difficult.
- Sorting the DATE: This is my current goal. I’ll explain below a little bit about what’s involved to do this properly.
- Output of the DATE: As mentioned above, many programs don’t export valid DATE values. When I add GEDCOM export in Version 1.5, I’ll ensure that all DATEs are valid GEDCOM 5.5.1 dates. And maybe I’ll add some of the input checking then as well.
So let’s take a look at how much thought the GEDCOM developers put into designing the DATE field in GEDCOM. I’m surprised that most programs don’t tweak into this and provide easy tools to let you use these various date forms:
The basic date in GEDCOM is like this: dd MMM yyyy, e.g. 02 JUL 1917.
Some things to know about that basic date: You can list either “day month year”, or “month year” or “just the year”. The day can be 1 or 2 digits, so 02 JUL 1917 or 2 JUL 1917 are allowed. The month is the 3 letter English abbreviation and is capitalized. Jul is not correct, although a lot of programs incorrectly export it that way. The year must be 3 or 4 digits. So 811 is okay, but 52 is not. For the real year 52, you’d have to put 052 or 0052. Actually, this is very smart thinking making it easy for programmers to detect if it is the day first or the year first based on the number of digits encountered.
You can add “B.C,” on the end, but only on a year alone. e.g. 1500B.C. GEDCOM 5.5.1 shows no space between the year and the B, but GEDCOM 5.3 had a space. I hope all of you have researched your genealogy that far back.
Then (this is the exciting part) there are a bunch of useful modifiers. If you don’t know the exact date, you can use these:
- BEF date
- BET date AND date
- AFT date
which of course mean before, between and after. If an event or fact occurs over a period of time, you can use these:
- FROM date TO date
- FROM date
- TO date
and you use the single FROM and TO when you only know the one end, or it is something open-ended.
Then there’s the approximated dates:
- ABT date
- CAL date
- EST date
where ABT means “about” and is for inexact date. CAL is calculated mathematically, e.g. from an event date and age, and EST is estimated based on an algorithm using some other event date. I’m hoping in a future version to allow Behold to estimate your unknown event dates for you. It will use this EST modifier.
Then there’s the one I love the most. Hopefully your genealogy program allows you to enter this:
- INT date (date_phrase)
This is for a date you interpret from knowledge about the associated date phrase that is included in parenthesis, e.g.: INT 1917 (3 years after marriage)
The only real limitation is that GEDCOM 5.5.1 limits the length of the date_value to 35 characters, so you won’t get much room at all for the date_phrase. That seems much too short to me. Behold will allow any length, but I’ll have to make sure when exporting to GEDCOM that if it is too long, it will have to get added to a NOTE field.
Finally, you can use your own unstructured form:
This is any information about when an event occurred. It MUST be between parenthesis, but some programs incorrectly do not include them. It also can be only up to 35 characters long. This is again much too short for my liking. GEDCOM 5.3 had the maximum length at 90 characters, which was better. See how difficult absolute data checking is. The maximum length is dependent on the version of GEDCOM.
Then there’s a few different calendars supported by GEDCOM.
- Gregorian: This is the calendar in use today and is the default. It allows a year modifier which is a slash followed by one or two digits, e.g. 1698/9 or 1698/99. There’s enough description on the web about the reason and use of this (called Double Dating) that I won’t get into it here.
- Julian: The original, on which the Gregorian was based.
- Hebrew: A very different calendar with different month names. It is currently the year 5772.
- French Republican: I don’t know much about this one, but it’s also got its own set of month names.
To use one of the other calendars, you’ve got to include a special code for it prior to the date, e.g.: @#DFRENCH R@ 02 VEND 1917
Other than these calendars, all the month abbreviations must be in English. I’ve seen GEDCOMs where the month abbreviations are in French or Spanish. That technically is not allowed. How will other programs be able to interpret them correctly unless they program in an entire dictionary of abbreviations?
In a few minutes, you’ve learned pretty well what you need to know about DATEs in GEDCOM. So, you’d think it would be pretty easy for most programs to follow the standard. But even with the dates, we find all sorts of differences and customizations.
Take this small example GEDCOM file I’m using for testing dates in Behold. Try it in your current genealogy editing program and see how your program does. Shown below is how I display it in Behold 1.0. The NOTEs in the file are there so that you can see if the date in the program compares to how it was entered, and also point out which invalid dates were included in the test):
Looking at the way Behold now displays these dates, I may also decide to allow customization of the modifiers, so that ABT can be displayed as “About”, and the months can be displayed differently, e.g. in mixed case as “Feb” or in other languages such as “Fev”.
I haven’t yet sorted these dates. That’s what I’m working on now and should be finished in a few days and this will be included in Behold 1.0.1. To be smart about sorting dates, you have to have a few more considerations. There’s the double dates which should be sorted last, not first. And then certain modifiers (e.g. BEF, FROM) are sorted before dates with no modifiers and certain ones (e.g. BET, AFT, TO) are sorted after. Then there are some with two dates to use in the sort (e.g. BET/AND, FROM/TO).
Regarding sorting, I don’t know why some programs have implemented a “sort date” field. You’ve got the date, and you want to sort by the date. The sort date field is defaulted to your date. But they allow you to change this sort date field. Why? Don’t you just want to sort by your date. If the date is wrong, change it. If they are worried about date fields that are blank or are date phrases that can’t be sorted well, then the ABT or INT modifiers are perfect solutions. Why complicate matters with another field for the user to worry about and then confuse people when 1917 comes before 1912 because the sort date value for 1917 was 1910?
Out of curiosity, I inputted my DATE testing GEDCOM file into RootsMagic 4. Then I exported it from RootsMagic to GEDCOM. It re-sorted all the dates into what it thought was correct order. But that is wrong for multiple specifications of a single event, e.g. birth, where GEDCOM says the order of the listing of the event is significant. RootsMagic had trouble with the B.C. dates, only accepting the illegal BC, but that’s not a major issue for most people. More importantly, it exported the CAL modifier as CALC, which could be a concern when exporting to other programs. Then it somehow lost the NOTEs on the dates in other calendars, making me wonder if RootsMagic loses notes elsewhere as well. And it changed two notes in a row into a single continued note with a blank continuation line in between. I’m not sure if that unrequested change would be wanted by most people. Plus, RootsMagic made no attempt to correct the invalid dates, other than incorrectly “correcting” the valid B.C. dates. I expect this level of date handling with this test file would be typical, and other programs would not fare much better.
This test does show how genealogy developers do not bother to read the GEDCOM standard. By simply reading this blog post, you now know more about certain aspects of the DATE value in GEDCOM than the RootsMagic people do.