My current task is to implement the reading of the UTF-8 character set. This is a way of representing the Unicode standard for the alphabets and accented characters of the various languages of the world. A few genealogy programs are capable of handling Unicode characters (most notably PAF), and they usually export these characters into a GEDCOM as UTF-8.
This is a bit different than ANSEL which older programs used. ANSEL was a standard 10 years ago, but has been replaced by Unicode. Brother’s Keeper is a program that still exports accented characters using ANSEL. Several months ago, I implemented an ANSEL to ASCII translator, so that most of ANSEL characters would appear correct. Now I’m following the same model to do so with UTF-8.
Of interest at least to me, the UTF-8 encoding was invented primarily by Ken Thompson of Bell Labs. Ken also was one of the creators of the Unix Operating System. He also expanded the field of Computer Chess, and I had the pleasure of meeting Ken in 1978 at the 9th North American Computer Chess Championship.
But Unicode has many more characters than ANSEL does, and uses other alphabets that don’t translate to the ABC’s. It’s impossible to get everything right. I will finish up this basic translator in a day or two, but this has prompted me to move ahead my planned implementation of full Unicode. It will now be added in Version 1.8 of Behold. By doing it there, that means the implementation of Editing in Version 2.0 will include full Unicode/Multilanguage capabilities. I’ve added a new section: “9. Multiple Languages” on Behold’s Home Page to introduce the concepts of this.