Login to participate
Register   Lost ID/password?
Louis Kessler's Behold Blog » Blog Entry           prev Prev   Next next

How to Program Dates for Genealogy - Mon, 1 Feb 2016

Dates in genealogy are messy. Four years ago, I wrote a series of posts on some of the aspects of dates as defined by GEDCOM and some of the bad dates you see in GEDCOM files in the wild:

Sort of a Date
How About a Date?
Out on a Bad Date

In a recent post by on the RootsDev Google Group, Gary Stanley asked about Incomplete Dates: “Does anyone have any recommendations for working with incomplete dates and storing them within a database such as MySQL?”

For the record, I thought I’d lay out the method I decided on. It is based on GEDCOM’s date definition which as I’ve said in the above articles, I think is quite a good specification.

The good thing about the GEDCOM spec is that it is readable. The bad thing is that it does not sort, so an alternative internal format is needed to sort. So here is what I came up with and use internally in Behold. It is also how I’ll export dates in the database that Behold will produce.

The structure is a 12 or 24 character string (24 if it’s a date range) that can be sorted just by letting the computer naturally sort the strings.

Behold’s internal date format is:

C (Calendar): ‘/’ = Gregorian, ‘A’ = Julian, ‘F’ = ‘French’, ‘H’ = ‘Hebrew’
B (B.C.): ‘1′ = B.C., ‘2′ = A.D. If B.C. 
         If BC, then the YYYY is set to 9999 - YYYY so it sorts correctly
YYYY (Year)
M (Month): ‘1′ .. ‘C’ for JAN to DEC
                 ‘D’ .. ‘O’ for VEND to COMP
                 ‘P’ .. ‘Z’ for TSH to ELL
DD (Day)
* (Date Modifier): 1 = BEF, 2 = TO, 3 = (none), 4 = ABT, 5 = CAL, 5 = EST,
                         7 = INT, 8 = AND, 9 = FROM, A = BET, B = AFT
AA (Alternate year): e.g. the "93" in 1592/93
CBYYYYMM*AA = the second date in a date range (only if needed)

e.g. 4 Nov 1900 = ‘/21900A043  ’

If column 1 is a ‘(‘, then this is a GEDCOM date phrase stored as text between parenthesis, e.g.: (4 days old). Anything that does not fit the standard format automatically becomes a date phrase.

With regards to “incomplete dates’, GEDCOM allows year only, or month and year only. It does not allow month without year or day without month and year. My representation follows this idea and allows ‘/21900A003” (no day) and ‘/219000003’ (no month and day). Other types of incomplete dates will become a date phrase, e.g. ‘(14 Nov)’

I developed this structure before I discovered that RootsMagic uses something similar. RM uses several different formats of dates, but I think it’s worthwhile comparing the RootsMagic text representation of a date as described at http://sqlitetoolsforrootsmagic.wikispaces.com/Date+Formats

Their structure is an always 24 character string, as follows:

RootsMagic Text Dates:

C (Calendar): D = Standard date, Q = Quaker date, T = Text date 
* (Date Modifier): - = NONE, A = After, B=Bef, F=From, I=Since,
                         O = Or, R = Bet/And, S= From/To, T=To,
                         U = Until, Y = By
B (B.C.): ‘–’ = B.C., ‘+’ = A.D.
YYYY (year)  - can be 0000 if for partial date with no year e.g. Jan 1
MM (month) – can be 00 for partial date
DD (day) – can be 00 for partial date
A (Alternate year): ‘/’ = Double Date, ‘.’ = Otherwise
% (Surety): ? = Maybe, 1 = Prhps, 2 = Appar, 3 = Lkly, 4 = Poss,
        5 = Prob, 6 = Cert, A = Abt, C = Ca, E = Est, L = Calc, S = Say, . = other
BYYYYMMDDA% = the second date in a date range (always included)

e.g. 4 Nov 1900 = ‘D.+19001104..+00000000..’

This is interesting how similar, yet different my system is from RootsMagic. I’ve never seen a Quaker date. RM doesn’t seem to support the Julian and French and Hebrew dates that are in GEDCOM. Their modifier at the beginning of the string will prevent their dates from sorting properly. They include a surety which isn’t part of the GEDCOM date so I don’t include it. And they are always 24 characters, where my format is usually 12 but 24 if it is a date range. 

My one take out of this is that maybe I could save a character for my double dates by using a single character code instead of the two digits.

Regarding the “Sort Dates” that many programs (RootsMagic included) make you enter for events where you don’t know the date, but still want them sorted in a particular order. I think they are superfluous.

My feeling is that if you know enough about the date to be able to order it, then put in what you know using the date modifiers, e.g. if Mary was born in 1832 and John was born after Mary, then I’d say you should put the date for John’s birth in as “AFT 1832” and John will (at least in Behold) be sorted properly after Mary. And make sure you add an appropriate comment and source onto the birth event to state how you know this.

4 Comments           comments Leave a Comment

1. dirkb (dirkb)
Germany flag
Joined: Sun, 21 Feb 2016
2 blog comments, 0 forum posts
Posted: Sun, 21 Feb 2016  Permalink

Great article!
This will help me a lot with GEDCOM dates in my software (www.ahnenblatt.com).

I found one minor issues …
“CBYYYYMM*AA” is not quite correct - it should be “CBYYYYMDD*AA” (month only one character, the two characters for day were missing).

- Dirk

2. Louis Kessler (lkessler)
United States flag
Joined: Sun, 9 Mar 2003
237 blog comments, 226 forum posts
Posted: Sun, 21 Feb 2016  Permalink

Thank, Dirk. I’ve now fixed this in the article.

I likely will change the double dates to 1 character when I implement Behold’s own external file format. If and when I do, I’ll try to remember to update this article.

3. mjashby (mjashby)
United Kingdom flag
Joined: Fri, 23 May 2014
2 blog comments, 2 forum posts
Posted: Fri, 4 Mar 2016  Permalink

Quote: “I’ve never seen a Quaker date”

Put simply, Quaker dates do not use month names and usually appear in Quaker records in the following format: 3 vi 1790, which equates to 3 August 1790, i.e. Quakers don’t (didn’t) accept the general use of month names except for September to December as the months from January to August were named in honour of ‘pagan’ gods or historical figures which isn’t acceptable in the Quaker faith. However, September, October, November and December simply mean 7th month, 8th month, 9th month and 10th month, which was considered acceptable and occasionally appear in Quaker records as 7ber, 8ber, 9ber and 10ber. So, the traditional Quaker Calendar begins on the first of the first month (1 i), which is 1 March and ends on 28/29 of the twelfth month 28/29 xii, which is February.

If you ever refer to Quaker birth or marriage records, which usually form a part of the Minutes of Quaker Meetings, that is where you will normally find such dates; and they often lead to many records including those appearing on FamilySearch being miss-interpreted by well-meaning individuals who clearly didn’t understand that month i is March, month ii is April, etc. etc.


4. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
237 blog comments, 226 forum posts
Posted: Wed, 9 Mar 2016  Permalink

Thanks, Mervyn. That helps my understanding. Quaker dates are not supported by GEDCOM, so they’d have to be put as a date phrase in parenthesis. But a smart program would be able to interpret them from the date phrase.

Leave a Comment

You must login to comment.

Login to participate
Register   Lost ID/password?