What do you make of Tamura Jone's comment (http://www.tamurajones.net/SiblingTortureTest.xhtml):
None of these produced any errors or warnings, except Behold 1.04.
Behold warned that the date 13 Apr 2012 is non-standard, and should be 13 APR 2012; the warning is that the abbrevation should be in ALL-CAPITALS.
That is what the specification seems to say, but it does not;
Chapter 2 of the GEDCOM 5.5.1 specification clearly states that All controlled line_value choices should be considered as case insensitive.,
and that values should be converted to all uppercase or all lowercase prior to comparing.
That means that Apr is fine, and that means that you may even write aPR or aPr
It seems to me that all lower or all upper is correct but not mixed, as Tamura suggests.
See my blog post: How To Get A Developer To Fix A Bug.
No, Tamura's correct. The statement: "values should be converted to all uppercase or all lowercase prior to comparing" means that aPR and aPr should be both changed to APR (if uppercase is used for comparison) or to apr (if lowercase is used for comparison). Either way, aPR and aPr are equivalent to apr, APR and Apr.
So what does:
values should be converted to all uppercase or all lowercase prior to comparing.
Is this when:
1. comparing two dates within a program, such as to work out age or
2. two supposedly identical GEDCOMs are compared for differences?
If 1 above, how does a user know it is being done correctly by the user?
If 2 above, how do we change a GEDCOM to same case in both files, without a large (and possibly manual) conversion.
I think it simply means comparing for the purpose of interpreting its value.
For a DATE value, I don't just compare the month-part to JAN, FEB, MAR,..., but I compare the uppercased value of the month-part to JAN, FEB, MAR,...
For a TYPE value, I don't just compare the value to STILLBORN, but I compare the uppercased value of the value to STILLBORN.
I assume this applies to BET, ABt etc in that they can be Bet, bet etc but compared upper or lower cased.
Yes. All parts of the date. And that actually simplifies the work that Behold is doing.
Personally, I think it is a great idea that the GEDCOM designers had. I should have discovered it earlier, but now that I have, I'll make use of it.
Subsequent thinking on this makes me now believe that GEDCOM intended that only LINE_VALUEs that are an enumerated list of choices were to be allowed to be mixed upper and lower case.
A DATE_VALUE is a line value. But it is not made up of an enumerated list of choices. It is made up of a substructure, with some components of the substruction (such as month) being enumerated. I now don't believe that GEDCOM intended these complex structures to be allowed as mixed case, but should be precisely as defined (upper case).
Whether or not this is true, at least a warning should be given, because there may be programs that will not interpret all of "JAN", "Jan", "jan and "jAn" to be the month of January.
See also: http://www.beholdgenealogy.com/blog/?p=1087
By enumerated list of choices, are you meaning 'controlled' as referred to in the specification:
All controlled line_value choices should be considered as case insensitive.
This means that the values should be converted to all uppercase or all lowercase prior to comparing.
The terms UPPERCASE and UpperCase are considered equal. TAGS are always UPPERCASE.
Yes. However, GEDCOM does not define the difference between "controlled" and "uncontrolled" line values.
My interpretation is that controlled line values are line values that are restricted to a specified set of allowed optional values. Anything more complicated than that is likely deemed not to be controlled, since that is the logical meaning of the word "controlled".
Just signed up but I have been mulling over this issue for a bit.
To me, the operative words in interpreting the standard (5.5.1) are "prior to comparing":
Til now, my interpretation has been - and I still will need more convincing to alter that - that the case of actual value in the original does not matter.
IMO, the standard addresses the issue of whether data should be rejected due to differences in case and by specifying that the value from the original should be convert to either to upper or lower case 'prior to comparing' makes it clear that any and all combinations are acceptable as long a the complete string matches the string specified in the standard - in a case-insentive way. :-)
If the 'orIginal' string was to be expected to have a specific case formation - all upper or lower case or even leading capital - that is the way the standard should have expressed it.
You must login to post your reply.
Also check out my freeware program: GEDCOM File Finder
Copyright © Louis Kessler
All Rights Reserved