Login to participate
  
Register   Lost ID/password?
The Behold User Forum » Topic           prev Prev   Next next

Date validation - case sensitive - Categorized in: Report a ProblemReport a Problem

10 posts. Started 13 Apr 2012 by brett. Latest reply 1 day ago by arnold. RSS 2.0 feed for this topic RSS
1. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Fri, 13 Apr 2012 Permalink

Louis
What do you make of Tamura Jone's comment (http://www.tamurajones.net/SiblingTortureTest.xhtml):

None of these produced any errors or warnings, except Behold 1.04.
Behold warned that the date 13 Apr 2012 is non-standard, and should be 13 APR 2012; the warning is that the abbrevation should be in ALL-CAPITALS.
That is what the specification seems to say, but it does not;
Chapter 2 of the GEDCOM 5.5.1 specification clearly states that All controlled line_value choices should be considered as case insensitive.,
and that values should be converted to all uppercase or all lowercase prior to comparing.
That means that Apr is fine, and that means that you may even write aPR or aPr

It seems to me that all lower or all upper is correct but not mixed, as Tamura suggests.

Brett

2. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
149 blog comments, 201 forum posts
Posted: Fri, 13 Apr 2012 Permalink

Brett,

See my blog post: How To Get A Developer To Fix A Bug.

No, Tamura's correct. The statement: "values should be converted to all uppercase or all lowercase prior to comparing" means that aPR and aPr should be both changed to APR (if uppercase is used for comparison) or to apr (if lowercase is used for comparison). Either way, aPR and aPr are equivalent to apr, APR and Apr.

Louis

3. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Fri, 13 Apr 2012 Permalink

So what does:

values should be converted to all uppercase or all lowercase prior to comparing.

actually mean?

Is this when:

1. comparing two dates within a program, such as to work out age or

2. two supposedly identical GEDCOMs are compared for differences?

If 1 above, how does a user know it is being done correctly by the user?

If 2 above, how do we change a GEDCOM to same case in both files, without a large (and possibly manual) conversion.

Brett

4. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
149 blog comments, 201 forum posts
Posted: Fri, 13 Apr 2012 Permalink

I think it simply means comparing for the purpose of interpreting its value.

For a DATE value, I don't just compare the month-part to JAN, FEB, MAR,..., but I compare the uppercased value of the month-part to JAN, FEB, MAR,...

For a TYPE value, I don't just compare the value to STILLBORN, but I compare the uppercased value of the value to STILLBORN.

Louis

5. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Fri, 13 Apr 2012 Permalink

I assume this applies to BET, ABt etc in that they can be Bet, bet etc but compared upper or lower cased.

Brett

6. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
149 blog comments, 201 forum posts
Posted: Sat, 14 Apr 2012 Permalink

Yes. All parts of the date. And that actually simplifies the work that Behold is doing.

Personally, I think it is a great idea that the GEDCOM designers had. I should have discovered it earlier, but now that I have, I'll make use of it.

Louis

7. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
149 blog comments, 201 forum posts
Posted: Mon, 30 Jun 2014 Permalink

Subsequent thinking on this makes me now believe that GEDCOM intended that only LINE_VALUEs that are an enumerated list of choices were to be allowed to be mixed upper and lower case.

A DATE_VALUE is a line value. But it is not made up of an enumerated list of choices. It is made up of a substructure, with some components of the substruction (such as month) being enumerated. I now don't believe that GEDCOM intended these complex structures to be allowed as mixed case, but should be precisely as defined (upper case).

Whether or not this is true, at least a warning should be given, because there may be programs that will not interpret all of "JAN", "Jan", "jan and "jAn" to be the month of January.

See also: http://www.beholdgenealogy.com/blog/?p=1087

Louis

8. Brett (brett)
Australia flag
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Mon, 30 Jun 2014 Permalink

By enumerated list of choices, are you meaning 'controlled' as referred to in the specification:

All controlled line_value choices should be considered as case insensitive.
This means that the values should be converted to all uppercase or all lowercase prior to comparing.
The terms UPPERCASE and UpperCase are considered equal. TAGS are always UPPERCASE.

9. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
149 blog comments, 201 forum posts
Posted: Tue, 1 Jul 2014 Permalink

Yes. However, GEDCOM does not define the difference between "controlled" and "uncontrolled" line values.

My interpretation is that controlled line values are line values that are restricted to a specified set of allowed optional values. Anything more complicated than that is likely deemed not to be controlled, since that is the logical meaning of the word "controlled".

Louis

10. arnold (arnold)
Canada flag
Joined: 1 day, 11 hrs ago
1 blog comment, 4 forum posts
Posted: 1 day, 10 hrs ago Permalink

Just signed up but I have been mulling over this issue for a bit.

To me, the operative words in interpreting the standard (5.5.1) are "prior to comparing":

Til now, my interpretation has been - and I still will need more convincing to alter that - that the case of actual value in the original does not matter.
IMO, the standard addresses the issue of whether data should be rejected due to differences in case and by specifying that the value from the original should be convert to either to upper or lower case 'prior to comparing' makes it clear that any and all combinations are acceptable as long a the complete string matches the string specified in the standard - in a case-insentive way. :-)
If the 'orIginal' string was to be expected to have a specific case formation - all upper or lower case or even leading capital - that is the way the standard should have expressed it.

Leave your Reply

You must login to post your reply.

Login to participate
  
Register   Lost ID/password?