I've been thinking about genealogical sorting algorithms for dates, thought I'd post some thoughts for your feedback.
Concerning the storing/sorting format eluded to in a previous blog post, CBYYYYMDD*AA, I see that months for the calendar systems supported by Behold are specified by the designations 1-C, D-O, P-Z, for Jul/Greg, French, and Hebrew calendars respectively
Because the first character in the format, C, already serves to differentiate the calendar systems, why not simply have the M field always designated by 1-n, where n is the last month (usually C) in a particular calendar system?
Maybe I'm missing some basic premise or understanding, but it seems that if you sort dates via that defined encoding (CBYYYYMDD*AA), this will first sort all dates of particular calendar systems together into blocks. This seems like it would be ok if all the events in a person's life happen to be recorded within the same calendar system. What happens when lifespans traverse calendar systems, or an individual's recorded dates (due to travel) span calendar systems? Having a leading character denoting a calendar system, the sort will be wrong.
In 1710 part of Canada switched from Gregorian back to the Julian and then 42 years later back to the Gregorian, all within the lifetime of an individual. In this case, 13 Oct 1710 was followed by 3 Oct 1710, then back to Gregorian in 1752. How can an individual's recorded dates around this period be sorted correctly without either implementing a separate sort date or a conversion to a unified calendar system? What if an individual traveled from a Gregorian place to a nearby Julian place, or vice versa with events recorded in each?
Incidentally, in the example above, Canada experienced the days Oct 3rd to the 13th 1710 twice! Imagine the confusion posed by an infant born, christened, died, and buried within that month.
Yes, the month numbers can be reused.
And you're correct. This system will not sort correctly between calendars. I developed it mainly as a way to efficiently store GEDCOM dates in Behold, and figured out an easy way to also enable it to sort properly within a specific calendar.
For anyone else reading this, the relevant post is: How to Program Dates for Genealogy
Trying to sort between calendars would require conversion of the date between the systems. That is not yet programmed into Behold, but I can do so in the future if I find it's really needed. I would still leave the date format the same with the Calendar at the front. But what I would likely do is structure the date as:
where Date2 would be the original date in the Calendar given, and Date1 would be the date converted to Gregorian.
That way, the dates would all sort as Gregorian dates, and the original date and its translated date could both be displayed without converting it every time.
In my humble opinion, genealogy software needs to strive to be able to accommodate whatever is possible in the physical world. In my own genealogical programming, I use that as the litmus test for what's really needed. It's certainly possible and was common for person's lives to span calendar systems. George Washington is a famous example, as were his contemporaries who would not be rare edge cases in a typical genealogy.
I would also propose that displaying a proleptic Gregorian date for an event sourced and entered in some other calendar system would not serve any practical need for the end user; I see the "date1" field as 'for internal sorting use only'. If you agree with this, then I posit that using the Gregorian calendar system for purely internal sort is not the least computationally expensive method. Here's why:
All algorithms (that I have found) for converting between calendar systems always begin by converting the source date to an intermediate calendar system, the Julian Day Number (JDN, the number of days elapsed since Jan 1 4713 BC), and subsequently from Julian Day Number to the target calendar system. They do not convert directly. While it may be possible to convert directly specifically between Gregorian and Julian calendar systems because of their similar structure, without the intermediate step of JDN (perhaps using loads of If statements), the intermediate JDN calculation would still be needed for converting between other non-julian-like calendar systems such as to/from Hebrew or French Republican, etc. The thing is, the Julian Day Number is just as good an internal sort index as any, and because it's computed along the way anyway during conversions, it seems it would be simpler just to use the Julian Day Number to sort on.
So for January 1 2013, the JDN would be 2,456,293 , therefore: 2456293CBYYYYMDD*AA or more specifically, 2456293/220131013
Taking it a bit further, I think the JDN algorithm is a bit needlessly computationally expensive for genealogical sorting needs (who traces genealogy back to 4713 BC?) I created an algorithm, a Julian version of Rata Die, that calculates days elapsed since Jan 1 0001, and will expound if any of my reasoning thus far appeals to you.
There are other calendars as well, such as the Chinese or Hindu calendars. GEDCOM doesn't even address those.
What I think is most important is to make sure what people use and need the most works, and works well. Few people use other calendars in combination with Gregorian. If they do, they'll usually have a conversion program to work with.
I myself have a project that uses B.C. dates for tracking biblical and ancient Egypt events. I have not seen a single GEDCOM file with B.C. dates, even though GEDCOM has them defined.
The bottom line, as you illustrate, Dave, is that dealing with dates is not simple. Most programs don't bother to add the day of the week, or check that the day number is not greater than the number of days in the month, or have capabilities for approximate dates and date ranges.
Programs can't and shouldn't do everything that's possible, which of course is impossible. But they do need to handle the basics, and do that well.
You must login to post your reply.
Also check out other programs: GEDCOM File Finder and Double Match Triangulator
Copyright © Louis Kessler
All Rights Reserved