How to Program Dates for Genealogy - Mon, 1 Feb 2016
Dates in genealogy are messy. Four years ago, I wrote a series of posts on some of the aspects of dates as defined by GEDCOM and some of the bad dates you see in GEDCOM files in the wild:
Sort of a Date
How About a Date?
Out on a Bad Date
In a recent post by on the RootsDev Google Group, Gary Stanley asked about Incomplete Dates: “Does anyone have any recommendations for working with incomplete dates and storing them within a database such as MySQL?”
For the record, I thought I’d lay out the method I decided on. It is based on GEDCOM’s date definition which as I’ve said in the above articles, I think is quite a good specification.
The good thing about the GEDCOM spec is that it is readable. The bad thing is that it does not sort, so an alternative internal format is needed to sort. So here is what I came up with and use internally in Behold. It is also how I’ll export dates in the database that Behold will produce.
The structure is a 12 or 24 character string (24 if it’s a date range) that can be sorted just by letting the computer naturally sort the strings.
Behold’s internal date format is:
CBYYYYMDD*AA [CBYYYYMDD*AA]
where:
C (Calendar): ‘/’ = Gregorian, ‘A’ = Julian, ‘F’ = ‘French’, ‘H’ = ‘Hebrew’
B (B.C.): ‘1′ = B.C., ‘2′ = A.D. If B.C.
If BC, then the YYYY is set to 9999 - YYYY so it sorts correctly
YYYY (Year)
M (Month): ‘1′ .. ‘C’ for JAN to DEC
‘D’ .. ‘O’ for VEND to COMP
‘P’ .. ‘Z’ for TSH to ELL
DD (Day)
* (Date Modifier): 1 = BEF, 2 = TO, 3 = (none), 4 = ABT, 5 = CAL, 5 = EST,
7 = INT, 8 = AND, 9 = FROM, A = BET, B = AFT
AA (Alternate year): e.g. the "93" in 1592/93
CBYYYYMM*AA = the second date in a date range (only if needed)
e.g. 4 Nov 1900 = ‘/21900A043 ’
If column 1 is a ‘(‘, then this is a GEDCOM date phrase stored as text between parenthesis, e.g.: (4 days old). Anything that does not fit the standard format automatically becomes a date phrase.
With regards to “incomplete dates’, GEDCOM allows year only, or month and year only. It does not allow month without year or day without month and year. My representation follows this idea and allows ‘/21900A003” (no day) and ‘/219000003’ (no month and day). Other types of incomplete dates will become a date phrase, e.g. ‘(14 Nov)’
I developed this structure before I discovered that RootsMagic uses something similar. RM uses several different formats of dates, but I think it’s worthwhile comparing the RootsMagic text representation of a date as described at http://sqlitetoolsforrootsmagic.wikispaces.com/Date+Formats
Their structure is an always 24 character string, as follows:
RootsMagic Text Dates:
C*BYYYYMMDDA%BYYYYMMDDA%
where:
C (Calendar): D = Standard date, Q = Quaker date, T = Text date
* (Date Modifier): - = NONE, A = After, B=Bef, F=From, I=Since,
O = Or, R = Bet/And, S= From/To, T=To,
U = Until, Y = By
B (B.C.): ‘–’ = B.C., ‘+’ = A.D.
YYYY (year) - can be 0000 if for partial date with no year e.g. Jan 1
MM (month) – can be 00 for partial date
DD (day) – can be 00 for partial date
A (Alternate year): ‘/’ = Double Date, ‘.’ = Otherwise
% (Surety): ? = Maybe, 1 = Prhps, 2 = Appar, 3 = Lkly, 4 = Poss,
5 = Prob, 6 = Cert, A = Abt, C = Ca, E = Est, L = Calc, S = Say, . = other
BYYYYMMDDA% = the second date in a date range (always included)
e.g. 4 Nov 1900 = ‘D.+19001104..+00000000..’
This is interesting how similar, yet different my system is from RootsMagic. I’ve never seen a Quaker date. RM doesn’t seem to support the Julian and French and Hebrew dates that are in GEDCOM. Their modifier at the beginning of the string will prevent their dates from sorting properly. They include a surety which isn’t part of the GEDCOM date so I don’t include it. And they are always 24 characters, where my format is usually 12 but 24 if it is a date range.
My one take out of this is that maybe I could save a character for my double dates by using a single character code instead of the two digits.
—
Regarding the “Sort Dates” that many programs (RootsMagic included) make you enter for events where you don’t know the date, but still want them sorted in a particular order. I think they are superfluous.
My feeling is that if you know enough about the date to be able to order it, then put in what you know using the date modifiers, e.g. if Mary was born in 1832 and John was born after Mary, then I’d say you should put the date for John’s birth in as “AFT 1832” and John will (at least in Behold) be sorted properly after Mary. And make sure you add an appropriate comment and source onto the birth event to state how you know this.