Louis Kessler’s Behold Blog » Blog Entry

Monday, December 5, 2005 - Mon, 5 Dec 2005

Did I say I’d get this version out today? Well I tried. I really tried hard. I only had two things left to do. But the first took me just about all day.

I finished the character set stuff yesterday, and today it should have been simple to convert an ANSEL formatted input into Unicode. Previously, I was converting it directly to extended ASCII - but this couldn’t take advantage of the selection of character sets, or the future use of Unicode.

As I mentioned before in my blog, ANSEL is a very old standard no longer used except that it is still defined in the GEDCOM standard and many genealogy programs still output in ANSEL. References to ANSEL are getting harder to find on the Internet. The only source of info I could find about converting it to Unicode was an excellent analysis done by Heiner Eichmann. It didn’t take too long to program this, but … I couldn’t seem to get it to work for any of the accented characters.

I tried all sorts of different codings; I looked up some UTF to Unicode code and copied that; I checked the web for more info. Finally I found and remembered an open-source GEDCOM parser library and downloaded their file. Their documentation stated they handled ANSEL and converted it to UTF-8. That would have been good enough, because I already have UTF-8 to Unicode conversion, but I was dreading having to convert their C code to Delphi. After fighting to unzip the tar.gz (Unix compressed) file, I finally was able to view it after I converted the resulting file from Unix line-breaks to Windows. Looking for the ANSEL routine, I found a comment that said:

“The range from 0xe0 to 0xfe are diacritical marks. Note that in ANSEL they come *before* the base characters, in Unicode, they come *after*, so we have to buffer them …”

Glork! That was it! I call this learning the hard way. The way that comment was written, I’m sure that author must have struggled with it as well. So I ripped out the now spaghetti code that did the conversion, and in 15 minutes I had some simple clean code that worked.

… almost ==> because I found that some GEDCOMs split some lines with a CONC tag smack dab in-between the character and its accent mark with one on one line and one on the other. A little patch was needed and this was handled.

So now I’ve only got one thing left to do to get this version out. I have to make sure that the Info form is usable and all buttons are available under any screen resolution. My initial research uncovered a good article to help me. Tomorrow I’m a bit busy with family activities, but I intend to finish this off as soon as I can.

RSS Trackback Permalink

No Comments Yet

You must login to comment.

Search the Blog & Forum

Monday, December 5, 2005 - Mon, 5 Dec 2005

No Comments Yet

Leave a Comment