Login to participate
  
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Sort of a Date - Sun, 4 Dec 2011

Behold up to now has really not done much with the DATEs that have come in from the GEDCOM. It does look at the date value to pull out the year so it can display birth and death years in the Index of Names. And it was attempting to sort the dates in the event references in the Details sections, but in a very rudimentary manner, simply by year, month and day.

This is about to change as I now tackle the task of sorting all the events for an individual or a family, and to do so by the DATE value. This requires discovering what this DATE entails. The GEDCOM developers added much more than you’d expect.

There are actually 3 aspects of programming the DATE properly in a genealogy program:

  1. Input of the DATE: The DATE tag is more complicated than you’d think. It takes more than just a passing interest to properly input and understand this tag. I’ve always thought of Behold as a fairly robust GEDCOM checking program. But Behold currently inputs DATE as given and does not check it for validity. Now that I see what’s involved, I’ve decided I won’t be implementing DATE checking right now. The rules have changed between various GEDCOM versions, and ensuring that a date actually exists on the calendar is non-trivial and not free computationally, so true validity checking on the input side is difficult.
  2. Sorting the DATE: This is my current goal. I’ll explain below a little bit about what’s involved to do this properly.
  3. Output of the DATE: As mentioned above, many programs don’t export valid DATE values. When I add GEDCOM export in Version 1.5, I’ll ensure that all DATEs are valid GEDCOM 5.5.1 dates. And maybe I’ll add some of the input checking then as well.

So let’s take a look at how much thought the GEDCOM developers put into designing the DATE field in GEDCOM. I’m surprised that most programs don’t tweak into this and provide easy tools to let you use these various date forms:

The basic date in GEDCOM is like this: dd MMM yyyy, e.g. 02 JUL 1917.

Some things to know about that basic date: You can list either “day month year”, or “month year” or “just the year”. The day can be 1 or 2 digits, so 02 JUL 1917 or 2 JUL 1917 are allowed. The month is the 3 letter English abbreviation and is capitalized. Jul is not correct, although a lot of programs incorrectly export it that way. The year must be 3 or 4 digits. So 811 is okay, but 52 is not. For the real year 52, you’d have to put 052 or 0052. Actually, this is very smart thinking making it easy for programmers to detect if it is the day first or the year first based on the number of digits encountered.

You can add “B.C,” on the end, but only on a year alone. e.g. 1500B.C. GEDCOM 5.5.1 shows no space between the year and the B, but GEDCOM 5.3 had a space. I hope all of you have researched your genealogy that far back.

Then (this is the exciting part) there are a bunch of useful modifiers. If you don’t know the exact date, you can use these:

  • BEF date
  • BET date AND date
  • AFT date

which of course mean before, between and after. If an event or fact occurs over  a period of time, you can use these:

  • FROM date TO date
  • FROM date
  • TO date

and you use the single FROM and TO when you only know the one end, or it is something open-ended.

Then there’s the approximated dates:

  • ABT date
  • CAL date
  • EST date

where ABT means “about” and is for inexact date. CAL is calculated mathematically, e.g. from an event date and age, and EST is estimated based on an algorithm using some other event date. I’m hoping in a future version to allow Behold to estimate your unknown event dates for you. It will use this EST modifier.

Then there’s the one I love the most. Hopefully your genealogy program allows you to enter this:

  • INT date (date_phrase)

This is for a date you interpret from knowledge about the associated date phrase that is included in parenthesis, e.g.: INT 1917 (3 years after marriage)

The only real limitation is that GEDCOM 5.5.1 limits the length of the date_value to 35 characters, so you won’t get much room at all for the date_phrase. That seems much too short to me. Behold will allow any length, but I’ll have to make sure when exporting to GEDCOM that if it is too long, it will have to get added to a NOTE field.

Finally, you can use your own unstructured form:

  • (date_phrase)

This is any information about when an event occurred. It MUST be between parenthesis, but some programs incorrectly do not include them. It also can be only up to 35 characters long. This is again much too short for my liking. GEDCOM 5.3 had the maximum length at 90 characters, which was better. See how difficult absolute data checking is. The maximum length is dependent on the version of GEDCOM.

Then there’s a few different calendars supported by GEDCOM.

  • Gregorian: This is the calendar in use today and is the default. It allows a year modifier which is a slash followed by one or two digits, e.g. 1698/9 or 1698/99. There’s enough description on the web about the reason and use of this (called Double Dating) that I won’t get into it here.
  • Julian: The original, on which the Gregorian was based.
  • Hebrew: A very different calendar with different month names. It is currently the year 5772.
  • French Republican: I don’t know much about this one, but it’s also got its own set of month names.

To use one of the other calendars, you’ve got to include a special code for it prior to the date, e.g.: @#DFRENCH R@ 02 VEND 1917

Other than these calendars, all the month abbreviations must be in English. I’ve seen GEDCOMs where the month abbreviations are in French or Spanish. That technically is not allowed. How will other programs be able to interpret them correctly unless they program in an entire dictionary of abbreviations?

In a few minutes, you’ve learned pretty well what you need to know about DATEs in GEDCOM. So, you’d think it would be pretty easy for most programs to follow the standard. But even with the dates, we find all sorts of differences and customizations.

Take this small example GEDCOM file I’m using for testing dates in Behold. Try it in your current genealogy editing program and see how your program does. Shown below is how I display it in Behold 1.0. The NOTEs in the file are there so that you can see if the date in the program compares to how it was entered, and also point out which invalid dates were included in the test):

image

Looking at the way Behold now displays these dates, I may also decide to allow customization of the modifiers, so that ABT can be displayed as “About”, and the months can be displayed differently, e.g. in mixed case as “Feb” or in other languages such as “Fev”.

I haven’t yet sorted these dates. That’s what I’m working on now and should be finished in a few days and this will be included in Behold 1.0.1. To be smart about sorting dates, you have to have a few more considerations. There’s the double dates which should be sorted last, not first. And then certain modifiers (e.g. BEF, TO) are sorted before dates with no modifiers and certain ones (e.g. FROM, BET, AFT) are sorted after. Then there are some with two dates to use in the sort (e.g. BET/AND, FROM/TO).

Regarding sorting, I don’t know why some programs have implemented a “sort date” field. You’ve got the date, and you want to sort by the date. The sort date field is defaulted to your date. But they allow you to change this sort date field. Why? Don’t you just want to sort by your date. If the date is wrong, change it. If they are worried about date fields that are blank or are date phrases that can’t be sorted well, then the ABT or INT modifiers are perfect solutions. Why complicate matters with another field for the user to worry about and then confuse people when 1917 comes before 1912 because the sort date value for 1917 was 1910?

Out of curiosity, I inputted my DATE testing GEDCOM file into RootsMagic 4. Then I exported it from RootsMagic to GEDCOM. It re-sorted all the dates into what it thought was correct order. But that is wrong for multiple specifications of a single event, e.g. birth, where GEDCOM says the order of the listing of the event is significant. RootsMagic had trouble with the B.C. dates, only accepting the illegal BC, but that’s not a major issue for most people. More importantly, it exported the CAL modifier as CALC, which could be a concern when exporting to other programs. Then it somehow lost the NOTEs on the dates in other calendars, making me wonder if RootsMagic loses notes elsewhere as well. And it changed two notes in a row into a single continued note with a blank continuation line in between. I’m not sure if that unrequested change would be wanted by most people. Plus, RootsMagic made no attempt to correct the invalid dates, other than incorrectly “correcting” the valid B.C. dates. I expect this level of date handling with this test file would be typical, and other programs would not fare much better.

This test does show how genealogy developers do not bother to read the GEDCOM standard. By simply reading this blog post, you now know more about certain aspects of the DATE value in GEDCOM than the RootsMagic people do.

10 Comments           comments Leave a Comment

1. davek (davek)
United Kingdom flag
Joined: Mon, 5 Dec 2011
1 blog comment, 0 forum posts
Posted: Mon, 5 Dec 2011  Permalink

You don’t need to code all abbreviations for dates. What I do with GedView is just access the system locales which already include date abbreviations for every language the OS supports, along with the full names which I’ve also seen occurring quite often.

2. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Mon, 5 Dec 2011  Permalink

Thanks, Dave. That’s a great idea. And you’re right. I should check for the long names as well.

3. genej (genej)
United States flag
Joined: Wed, 5 Jan 2011
13 blog comments, 0 forum posts
Posted: Mon, 5 Dec 2011  Permalink

Hi Louis,

More good Kessler works!

You wrote, “Regarding sorting, I don’t know why some programs have implemented a ’sort date’ field.”

I do find sort dates helpful. You wrote, “You’ve got the date, and you want to sort by the date.” Well, you don’t always have a date. While I use sort dates for a host of circumstance, I find they reduce the temptation to make estimates (EST) that might be in error … estimates often become recorded as ABT and by third parties as at/on.

I haven’t always had access to sort dates, and didn’t use them for a long time. –GJ

4. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Mon, 5 Dec 2011  Permalink

GeneJ:

I personally think it’s better to have an EST or ABT date which shows your assumption about the sort order, rather than having a hidden date that allows you to make an assumption without validating or giving reason for it.

If you have the date specified, even if it’s a wild estimate, then you can add a note to say, e.g. “Year of birth based on knowledge that Joe was older than Mary but younger than Fred”.

Louis

5. sue-adams (sue adams)
United Kingdom flag
Joined: Tue, 6 Dec 2011
3 blog comments, 0 forum posts
Posted: Tue, 6 Dec 2011  Permalink

I didn’t know about the CAL, INT and EST modifiers, but as a user, I have not studied the GEDCOM standards. That program vendors don’t get it is just scandalous!

Thank you for your insights.

6. genej (genej)
United States flag
Joined: Wed, 5 Jan 2011
13 blog comments, 0 forum posts
Posted: Tue, 6 Dec 2011  Permalink

@Louis,

In your reply, “I personally think …,” which probably answers your original question about why some users (thus vendors) find the sort date option to be a “feature.” When I’ve seen that feature included in a program, I have not found it to be complicated. As you say, it defaults to the date entered, so you only work with the field when you want it.

It would be interesting to look at a series of articles from some journals for examples about the use of estimated dates determined in an environment where the search met that “exhaustive” standard.

Separately, then, I have entered information entries about persons for which I feel a displayed date is both unnecessary and distractive. Such is the case for an entry about my mom and her dear friend who made annual reunions that involved a number of really fun regiments. The first date on which my dad became registered as a scratch golfer isn’t even a curiosity of mine -that he was a scratch golfer is the point of my “tag.” For some it might be interesting to know the date on which he was an age shooter, but even that date I don’t record in the tag–it’s just an interesting anecdote that becomes part of my undated “golfing” tag.

A few other thoughts on birth vs baptismal and death vs burial; will try to post about those on one of your earlier posts.

You do such good work, Louis! -GJ

P.S. Totally with you that fields associated with date_phrase should not be so limited.

7. dirkb (dirkb)
Germany flag
Joined: Sun, 21 Feb 2016
2 blog comments, 0 forum posts
Posted: Sun, 21 Feb 2016  Permalink

Great article!
This will help me a lot with GEDCOM dates in my software (www.ahnenblatt.com).

I found one minor issue …
In example GEDCOM file (datetest.ged) you have marked with a NOTE whenever there is an invalid GEDCOM date.
You use “@#DHEBREW@ 02 TIS 5250″ which is not marked as invalid (”TIS” should be “TSH”).

- Dirk

8. Louis Kessler (lkessler)
United States flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Sun, 21 Feb 2016  Permalink

Dirk, I’m currently away at the Unlock the Past genealogy cruise. I’ll check the issues you pointed out when I get back.

Louis

9. Tony Proctor (acproctor)
Ireland flag
Joined: Wed, 8 Aug 2012
10 blog comments, 0 forum posts
Posted: Fri, 12 Jan 2018  Permalink

Small error here, Louis. When you talk about sorting, you suggest that BEF/FROM come before, and BET/AFT/TO. But FROM is akin to AFT (one inclusive and one not) rather than BEF. In other words, I believe you meant FROM and TO to be switched over here.

10. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Sat, 13 Jan 2018  Permalink

Thank you eagle-eyed Tony for taking the time to let me know. Yes, you are correct. It should be BEF/TO before and FROM/BET/AFT that follow. I’ve now updated this blog post to reflect this.

I checked my code in Behold, and it is implemented correctly. I just errored in this blog post.

Leave a Comment

You must login to comment.

Login to participate
  
Register   Lost ID/password?