Login to participate
  
Register   Lost ID/password?
The Behold User Forum » Topic           prev Prev   Next next

Dates and Sorting - Categorized in: General DiscussionGeneral Discussion

5 posts. Started 23 Dec 2016 by davej. Latest reply 3 Jan 2017 by lkessler. RSS 2.0 feed for this topic RSS
1. davej (davej)
United States flag
Joined: Fri, 23 Dec 2016
0 blog comments, 3 forum posts
Posted: Fri, 23 Dec 2016 Permalink

I've been thinking about genealogical sorting algorithms for dates, thought I'd post some thoughts for your feedback.

Concerning the storing/sorting format eluded to in a previous blog post, CBYYYYMDD*AA, I see that months for the calendar systems supported by Behold are specified by the designations 1-C, D-O, P-Z, for Jul/Greg, French, and Hebrew calendars respectively

Because the first character in the format, C, already serves to differentiate the calendar systems, why not simply have the M field always designated by 1-n, where n is the last month (usually C) in a particular calendar system?

Sorting:

Maybe I'm missing some basic premise or understanding, but it seems that if you sort dates via that defined encoding (CBYYYYMDD*AA), this will first sort all dates of particular calendar systems together into blocks. This seems like it would be ok if all the events in a person's life happen to be recorded within the same calendar system. What happens when lifespans traverse calendar systems, or an individual's recorded dates (due to travel) span calendar systems? Having a leading character denoting a calendar system, the sort will be wrong.

In 1710 part of Canada switched from Gregorian back to the Julian and then 42 years later back to the Gregorian, all within the lifetime of an individual. In this case, 13 Oct 1710 was followed by 3 Oct 1710, then back to Gregorian in 1752. How can an individual's recorded dates around this period be sorted correctly without either implementing a separate sort date or a conversion to a unified calendar system? What if an individual traveled from a Gregorian place to a nearby Julian place, or vice versa with events recorded in each?

Incidentally, in the example above, Canada experienced the days Oct 3rd to the 13th 1710 twice! Imagine the confusion posed by an infant born, christened, died, and buried within that month.

2. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Sun, 25 Dec 2016 Permalink

Dave,

Yes, the month numbers can be reused.

And you're correct. This system will not sort correctly between calendars. I developed it mainly as a way to efficiently store GEDCOM dates in Behold, and figured out an easy way to also enable it to sort properly within a specific calendar.

For anyone else reading this, the relevant post is: How to Program Dates for Genealogy

Trying to sort between calendars would require conversion of the date between the systems. That is not yet programmed into Behold, but I can do so in the future if I find it's really needed. I would still leave the date format the same with the Calendar at the front. But what I would likely do is structure the date as:

Date1+Date2

where Date2 would be the original date in the Calendar given, and Date1 would be the date converted to Gregorian.

That way, the dates would all sort as Gregorian dates, and the original date and its translated date could both be displayed without converting it every time.

3. davej (davej)
United States flag
Joined: Fri, 23 Dec 2016
0 blog comments, 3 forum posts
Posted: Tue, 27 Dec 2016 Permalink

Hi Louis

I think the solution you propose is workable, but I wonder if there is a gotcha to be leery of: mainly concerning the conversion algorithm used to convert Julian to Gregorian.

Most Julian-to-Gregorian converters that I find on the web make no attempt to query the first-day-of-year preceding the date being entered for conversion, prior to making the conversion; they all assume the start of the year was January 1st. The question is, could this create possible sorting errors. Here's my logic:

Even while sharing the Julian calendar, different countries (or regions within countries such as Lorraine vs. France) considered the start of the new year to be on different days. Therefore two possible source dates recorded in different locations, say, Feb 15 1559, and Feb 15 1560 could very well be the same day if one of the locations incremented their new year on March 25th. And yet a typical conversion of both those dates to Gregorian would place the resultant Gregorian dates a year apart.

Therefore I propose that when converting a Julian date to Gregorian for the purpose of populating "Date1" as you have defined it above, there must needs be a year correction done prior if the date to be converted references a start-of-the-year that was not January 1st && the date is between January 1st and the start-of-new-year date. So if the date to convert is 15 Feb 1559, it can be directly converted if the source authority incremented its new year on January 1st, but if the source incremented its new year on March 25th, then the date must be corrected to 15 Feb 1560 prior to conversion to Gregorian.

Steve Morse's Julian to Gregorian calculator (http://www.stevemorse.org/jcal/julian.html) is the only one I have found that allows this correction for you. He allows additional entry of a date for start-of-new-year, and not limited to Jan 1 and Mar 25, as apparently, there were other starts-of-new-year such as Dec 25th.

So the question comes to how to ensure this happens. Because your storage format has provision for 'alternate year' (AA), any time an alternate year AA is populated, the AA could be used to construct the year for conversion purposes rather than the 3rd and 4th YY, but in the event of century changes such as 1599/00, the second Y would need changing too.

But further down the rabbit hole, this assumes that all dates with non Jan 1 reference-new-year have been imported/entered/stored in double date format (using AA). It perhaps then assumes the source record itself was written in double date format as well, which it might not have been. And finally, who has the onus of prompting for accuracy, user or programmer. The design approach could be: if it's entered in double date format, it will be converted to Gregorian correctly and thus sorted correctly, if not, and the new year wasn't Jan 1, then it's the genealogist's problem (assumed Jan 1 is year start).

An alternative could be to code an expansive lookup table of locations and dates for which the new year date changed (some locations changed their new years day long before changing to the Gregorian calendar), but this doesn't seem practical, and cannot cover every medieval village that didn't get the memo.

The genealogy program Gramps has an interesting approach; for all date entries there is an option to explicitly specify the start of the year.

There may also be a few (possible) far fetched edge cases for sort issues: one or two locations had a February 30th, which the online converters don't like; Sweden which for some years was neither Julian nor Gregorian, but I've not thought long on those trifles.

-Dave

4. davej (davej)
United States flag
Joined: Fri, 23 Dec 2016
0 blog comments, 3 forum posts
Posted: Tue, 3 Jan 2017 Permalink

Hi Louis,

In my humble opinion, genealogy software needs to strive to be able to accommodate whatever is possible in the physical world. In my own genealogical programming, I use that as the litmus test for what's really needed. It's certainly possible and was common for person's lives to span calendar systems. George Washington is a famous example, as were his contemporaries who would not be rare edge cases in a typical genealogy.

I would also propose that displaying a proleptic Gregorian date for an event sourced and entered in some other calendar system would not serve any practical need for the end user; I see the "date1" field as 'for internal sorting use only'. If you agree with this, then I posit that using the Gregorian calendar system for purely internal sort is not the least computationally expensive method. Here's why:

All algorithms (that I have found) for converting between calendar systems always begin by converting the source date to an intermediate calendar system, the Julian Day Number (JDN, the number of days elapsed since Jan 1 4713 BC), and subsequently from Julian Day Number to the target calendar system. They do not convert directly. While it may be possible to convert directly specifically between Gregorian and Julian calendar systems because of their similar structure, without the intermediate step of JDN (perhaps using loads of If statements), the intermediate JDN calculation would still be needed for converting between other non-julian-like calendar systems such as to/from Hebrew or French Republican, etc. The thing is, the Julian Day Number is just as good an internal sort index as any, and because it's computed along the way anyway during conversions, it seems it would be simpler just to use the Julian Day Number to sort on.

So for January 1 2013, the JDN would be 2,456,293 , therefore: 2456293CBYYYYMDD*AA or more specifically, 2456293/220131013

Taking it a bit further, I think the JDN algorithm is a bit needlessly computationally expensive for genealogical sorting needs (who traces genealogy back to 4713 BC?) I created an algorithm, a Julian version of Rata Die, that calculates days elapsed since Jan 1 0001, and will expound if any of my reasoning thus far appeals to you.

Cordially,
Dave

5. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
287 blog comments, 245 forum posts
Posted: Tue, 3 Jan 2017 Permalink

There are other calendars as well, such as the Chinese or Hindu calendars. GEDCOM doesn't even address those.

What I think is most important is to make sure what people use and need the most works, and works well. Few people use other calendars in combination with Gregorian. If they do, they'll usually have a conversion program to work with.

I myself have a project that uses B.C. dates for tracking biblical and ancient Egypt events. I have not seen a single GEDCOM file with B.C. dates, even though GEDCOM has them defined.

The bottom line, as you illustrate, Dave, is that dealing with dates is not simple. Most programs don't bother to add the day of the week, or check that the day number is not greater than the number of days in the month, or have capabilities for approximate dates and date ranges.

Programs can't and shouldn't do everything that's possible, which of course is impossible. But they do need to handle the basics, and do that well.

At this point, I'm not ready to change my internal date representation to incorporate sorting between calendars. I don't think converting everything to Julian dates is something I want to do.

Leave your Reply

You must login to post your reply.

Login to participate
  
Register   Lost ID/password?