Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

A New(?) Genealogy Program –Treebard - Mon, 4 Mar 2024

One of the reasons why I closed GenSoftReviews last year was because the development of new programs for genealogy had been drying up. There are so many full featured genealogy programs available to choose from that there is hardly any task that at least some of them could do.

And with the onset about 10 years ago of genealogy software that could provide you record hints and tree matches automatically, anything less would be a no-go for most people.


Treebard

So to my surprise, yesterday I learned about a program that I had not heard of previously in a Facebook post from Tamura Jones. The program is called Treebard, by Scott Robertson. He describes it as:

“a free, open-source, portable, public-domain genealogy database-entry showcase of functionalities written in Python, Tkinter, and SQLite”

I had to Google “Tkinter”, because I didn’t know what it was.

If it were just another genealogy program, then I wouldn’t have been bothered. But it is the wealth of content that Scott has included on his website about his programming endeavor that interested me and made me take notice.


Scott’s Philosophy

Scott’s program is not brand new, but it is relatively new on the scale of when most genealogy programs were being started, which is decades ago. The program’s Readme on Github states that Scott, with the username ProfessUdGuru, started developing the program in July 2018. That’s almost 6 years ago, and what surprises me is that neither Tamura nor I had heard of the program before.

Scott’s writings include a lot of commentary about GEDCOM, (that I’ll come to in a bit) which he dismisses by stating:

“I am particularly fond of the idea that someone will see the light about continuing to limp along with GEDCOM, and instead just start using Treebard’s database structure (UNIGEDS) or be inspired by it, to give birth to a Universal Genealogy Data Structure which would replace GEDCOM if app developers could be inspired enough to all use the same data storage structure for their app’s primary features.”

He writes about Treebard’s Kind of Genealogy:

  1. Not conclusion-based, and not evidence-based, but one-factoid-at-a-time-based.
  2. Super easy to enter sources without copying and pasting.
  3. Conclusions are backed up by assertions. Assertions are backed up by sources. He says his assertions feature is unique to Treebard. Although I think it very much resembled the persona idea promoted by Tom Wetmore, developer of Lifelines. “Assertions are what your sources say.”
  4. A friendly and intuitive user interface. It should be easy to do complicated things.
  5. A simple and straightforward data structure, good enough for all programmers to adopt someday.


His Videos

Scott has a new YouTube channel with 38 videos that he’s added in the past 3 weeks. They are all very interesting and worth watching

  • 2 are about Treebard’s Philosophy
  • 4 are about using Treebard
  • 10 are a Treebard GPS Tour 2022 
  • 1 is about his Trying to Use Gramps Genealogy Software
  • 2 are about Genealogy Resources
  • 8 are about GEDCOM programming
  • 12 are a Do-It-Yourself Genealogy Application 2024 (Scott’s jouney in the writing of his program)  

image

And he’s funny as well. You’ll enjoy Scott’s unique honest presentation style.


The Database

I was very curious as to what Scott’s “Universal Genealogy Data Structure” looked like. He used much of what would be considered common practice, following GEDCOM’s record and data structures in many cases (e.g. his dates are just like GEDCOM).

In one of his videos, he said he originally didn’t want a family record, but eventually decided he needed one to hang his family events onto.

Below is his data structure from his Create Your Own Genealogy Software 002 video. (Click on image to get a larger image).

image

Notable: he has a Place table (a big omission from GEDCOM).

And unlike GEDCOM, his database has an Events table, effectively making his program an events-based program. In the early days of GEDCOM, an Event-GEDCOM standard was developed parallel to our conclusion-based GEDCOM, but it didn’t really catch on. Scott would probably prefer Treebard to be referred to as an assertion-based program, since the assertions that are attached to sources are his unique feature.

Here for example is how Treebard presents its assertions from his Tour 008 video:

image


Scott’s Thoughts on GEDCOM

Of all his videos and writings (and there’s a lot of it), my favorite is Scott’s commentary about GEDCOM. Here’s his points that intrigued me the most:

  • Scott took more than 3 months to create a GEDCOM import program by “curbing my appetite for perfection.”
  • He replaced all of GEDCOM’s ambiguous tags with unambiguous tags.
  • He did not handle “edge cases such as tags that no one uses since they’re so very close to being useless”.
  • He ignores all custom tags. See! Most programs will not read them. Not GEDCOM’s fault. Fault of programmers who indiscreetly use them instead of alternative valid constructs that GEDCOM does provide.
    Scott correctly adds:”Custom tags are used so much for storing vendor cruft–which is not their purpose–that each vendor has created a master’s course in navigating their personal flavor of GEDCOM abuse.”
    • Scott introduces gedMOM – “a GEDCOM file designed as if it were meant to stuff genealogy data directly into a UNIGEDS database, with no hiccups.” – I like that idea.
    • Scott describes the GEDCOM NOTE tag as: “details that don’t fit elsewhere” which is an excellent definition.
    • Scott correctly maps his own ASRTN (assertion key) onto GEDCOM’s TEXT tag. It’s just that most genealogy software developers never thought of it that way.
    • Scott ponders: “Why is it so easy to write a GEDCOM export program when it is so difficult to write a GEDCOM import program?”
      His answer: “Simple. With the export project… you’re starting from your own data structure, so you aren’t lost in the wilderness of someone else’s ideas of how things should be done;”
    • Since “1 MARR N” (not married) is not allowed in GEDCOM, Scott proposes:  “1 FACT unmarried – 2 TYPE marital status” which is a very logical way of doing it that I’ve not seen anyone else suggest.
    • Check out the rest of Scott’s GEDCOM page (items 17 to 93) for his many other comments about GEDCOM specifics that I mostly agree with.


      I’m My Own Grandpa

      Finally, as an extra bonus, at the beginning of Scott’s GEDCOM page, Scott lists the lyrics to Jaffe’ and Latham’s “I’m My Own Grandpa” song. And, he provides a GEDCOM file for it!

      I loaded the GEDCOM file into Behold and Behold correctly reports that there are no loops. Actually the grandpa relationship is not by birth or adoption but is only through marriage which technically makes it possible. I’ll leave you to figure it all out as an exercise.


      Conclusion

      There is so much great stuff to read and watch at Scott’s website. Be sure to do so! And if you’re brave, try his software.

      Behold, My Genealogy, and Syncing - Sun, 25 Feb 2024

      Over the past several months, I’ve been back to work on the next version of Behold. I’m hoping to release the next major version in the next …  - okay, a programmer knows better than to promise a release date, but let’s say as soon as it’s ready. Keep an eye on Behold’s Future page to follow my progress.

      The last major release of Behold was Version 1.2.1 which I released in March 2016. Since then, I’ve released 6 additional point versions made up mostly of fixes and small improvements, with the last point release being Version 1.2.7 in September 2021.

      So it’s been almost 8 years since the last major release of Behold. What have I been up to?


      What Have I Been Up To?

      Two things really.

      The first thing that caught me was DNA. It was just after my 2016 Unlock The Past Genealogy cruise that I submitted my uncle’s and then my DNA to Family Tree DNA for testing. At that point it was DNA or Bust and I submitted my DNA everywhere, learned everything I could about genetic genealogy, and in 2017 created my Double Match Triangulator program which placed 3rd in the Roots Tech 2017 Innovator Showdown. I spent a lot of time over the past 8 years developing DMT and getting every last drop of genealogical worth out of my tests. I’ve written a lot of technical blog posts about my DNA analysis over this time. And that journey has now run its course.

      The other thing that slowed me down is my own genealogy. That’s a very good thing! It was 2016 when I headed into my retirement from my 40 year career at Manitoba Hydro. Up to that point, my genealogy effectively lay dormant in dozens of binders, files and boxes. This was material I collected over the years with the intention of going through and putting together once I retired. And I’ve been doing that.

      As far as my actual family tree itself, I hadn’t updated it since 1994 when I was using Reunion for Windows. Leister sold their Windows program to Sierra Online who were redeveloping it as Generations, but it was then purchased by Genealogy.com and dropped to eliminate the competition for their program Family Tree Maker. I still have the last GEDCOM file I exported from Reunion called KESS9407.GED which had the 1,361 known relatives from my and my wife’s families.

      I didn’t purchase another genealogy program after that. Instead, I started developing Behold in my spare time on evenings and weekends since I was then working full time. The intent was that it would be the genealogy editor I wanted for myself to replace Generations. I purchased a Rich Text editing package called TRichView to handle the display and editing. It works just like Word as a WSYIWYG (What You See Is What You Get) editor, and the goal was to turn Behold into what would still be the only genealogy WYSIWYG editor.

      I released the first alpha version of Behold in 2005, and Version 1.0 was out in 2011. So far it only was a GEDCOM reader, but I still desired it to be an editor.

      Then a big change for genealogists happened. Companies like Ancestry and MyHeritage were offering online family tree programs with a bonus: billions of records with automated searches that provide you with relevant hints. That changed everything! In 2017, I attended the 13th International Genealogy Conference in Houston sponsored by Family Tree DNA. MyHeritage was there and offered a great lifetime half-price offer to all attendees on their complete package, and I bit. All of a sudden, MyHeritage’s Record Matches with their billions of records and Smart Matches with their millions of family trees were what was important. And their online editor was convenient and good enough, along with their free downloadable Family Tree Builder software that could sync with your online tree.

      About the same time, I got lucky. Starting in 2017, records from my ancestors towns in what is now Ukraine and Romania started to become available. I acquired over 400 birth, marriage, death and census records from 4 different researchers and added 3 generations back to the early 1800s for most of my lines. MyHeritage and its record collections and family trees revolutionized my task of finding decendants of my newly discovered European family, sending me the likely matches to review. From the 1,361 known family members I had in 1994, my family tree has grown to be 10,800 today, which does include several thousand people in an important place to place study I have been working on.

      image

      I’m now sitting in a really good position. I’m well into digitizing my binders, files and boxes. Every day I check for new Record Matches and Smart Matches at MyHeritage and process them and their implications to my tree right away and research any additional hints they provide. The majority of my family tree information is now sourced. In 2016, I never thought I would get to this point.


      So What’s Important Now?

      Over the past 8 years, the information has just poured in. The tap is starting to run dry. I’m no longer expecting a lot of new information. Records only started in Eastern Europe in the early 1800s, so I won’t be able to go any further back. My family tree has matured and it’s a now a matter of ensuring quality and keeping up with any new records that come along.

      What’s missing from this equation is to ensure the preservation of the data I have collected and to make it widely available so that others who connect with me won’t have to work to put it together it like I did. That would mean sharing it on other family tree sites such as Ancestry, FamilySearch, WikiTree, Geni, Geneanet, Genealogyonline.

      I have an account on Ancestry, but I only have a small tree there. I have not used Ancestry’s hint system yet, since I’ve been concentrating on MyHeritage, but it would be valuable to do so. The key would be to set up a full tree there by downloading from MyHeritage and then uploading to Ancestry. Then Ancestry’s hints and family trees can work and do their magic and maybe fill in a few more boxes.

      But once that initial tree is up, I can’t do it again. After I process the hints, a new upload will likely recreate all the old hints. So I’ll need to keep them synced somehow. There are two programs that claim to sync with Ancestry. One is Family Tree Maker and the other is RootsMagic.  I’ll have to experiment with both and see if a reinfusion of a new GEDCOM into either FTM or RM will continue to sync with Ancestry, or if it will break the linkage. If the linkage can be maintained, maybe I can then follow this procedure:

      MyHeritage –> GEDCOM –> FTM or RM

      FTM or RM -> Ancestry

      Ancestry Hints –> MyHeritage

      FamilySearch is also an important tree to have information at. I uploaded about 1,500 deceased family members via a GEDCOM a few years ago. RootsMagic and Ancestral Quest both sync with FamilySearch. I also understand that MyHeritage provides syncing with FamilySearch as well, but currently only for members of the Church of Latter-day Saints. Hopefully they eliminate that restriction in the future. Even so, you have to be careful because other people edit FamilySearch. You wouldn’t want to copy any unverified information from FamilySearch back into your own tree.

      WikiTree is a One World tree. It is different because it stresses biographies with human input. This is valuable but requires a lot of manual labour to maintain. I was fortunate to have been a WikiTree Challenge guest and had my tree worked on, and I added to that later by being a participant in two of their Connect-a-Thon events. They have some great tools including a WikiTree Sourcer Browser extension which could pull a person’s facts from a FamilySearch or Ancestry page and then enter it for you on a new WikiTree person page, saving you a whole lot of typing and manual effort. Great tool! There is no program which will automatically sync with WikiTree for you. However, Behold does a pretty good job of displaying your WikiTree data that you’ve downloaded to a GEDCOM file.

      Geni is another One World tree now owned by MyHeritage. It would be nice if MyHeritage could figure out some way of syncing data between Geni and MyHeritage. Geni does get hints from MyHeritage, and MyHeritage does give Record Matches with Geni profiles.

      I’ve uploaded an extract of my family tree via GEDCOM to Geneanet and genealogyonline a few years ago, but I really haven’t worked enough with either of them to figure out how to best make use of their sites.


      And What’s Needed Now?

      We each have our one primary place where we maintain our family tree. It may be a desktop program, or an online tree, or a desktop program synced with an online tree.

      What we need are programs to sync and/or make it easier to permeate our  information everywhere else. Nobody wants to have to retype everything a dozen times.

      I no longer need to convert Behold into a genealogy editor. MyHeritage for me is good enough for that.

      But I do like the assistance Behold already provides to be able to easily see what data I’ve got at MyHeritage and at all the other sites.

      I want Behold to do a bit more. I have some ideas and I’m working on it.

      Stay tuned.

      Can Artificial Intelligence Read Russian Handwriting? - Wed, 7 Feb 2024

      There’s been a lot of talk the last year or so about the use of Artificial Intelligence for Genealogy. I’ve basically taken a laissez faire wait-and-see attitude towards it. Most of the applications of AI for genealogy are designed to save you time, maybe by drafting out a biography for you or doing image creation, repair or animation.

      But I’m looking for something that can help me, and help me specifically with regards to one particular task. The task I’m interested in is reading handwriting – not just any handwriting, but the handwriting in Birth, Marriage, Death and Census records from the Russian Empire.


      Transkribus - AI to Read Handwriting

      I was made aware by Jarrett Ross’s post on Twitter a week ago of an online program called Transkribus.

      Transkribus describes itself as:

      “an AI-powered platform for text recognition, transcription and searching of historical documents – from any place, any time, and in any language.”

      You upload your handwritten document. You select one of their public models for different languages and time periods. If the public models don’t serve your needs, you can train a custom model. They supply an introductory video on Getting Started with Transkribus.


      Artificial Intelligence Model Types

      Reading handwriting is a difficult problem, but is something that Artificial Intelligence should one day be able to handle.

      I classify AI as one of two types:

      1. Expert systems
      2. Self-training models

      An expert system is one where you as a human tell a program exactly how to do every step of a process. The program does not learn anything on its own, but the result can seem to be very intelligent and be completed faster and more accurately than any human can.

      A self-training model is one where you give a general AI program lots of different problems to be solved along with the answers to each problem. You let the program itself work out how best to generalize the problem and produce a solution for it.

      AI can be expert systems, self-training models, or a combination of the two.

      An example is a chess program. The first programs were all expert systems. All the rules were written by the programmer. In 1997, Deep Blue became the first chess program to beat the world champion who was Kasparov at the time. This program was an expert system, but with hardware that made calculations very fast.

      However, self training systems can do better. In 2016, a chess program called AlphaZero was developed that was trained solely via self-play for just 9 hours. It then defeated Stockfish, which was at the time the strongest chess program, and it won with an amazing score of 28 wins, 72 draws, and zero losses.


      What is Involved in Reading Handwriting?

      The goal with handwriting recognition is simply to transcribe the handwriting into text. No translation is required. We are just looking to have each handwritten letter, number or symbol converted to the correct text. And with as few mistakes as possible.

      There are already many good translation tools available (e.g. Google translate) so if the handwriting is in a foreign language, the transcription should correctly translate. The program to read the handwriting and create a transcript need not translate it or understand what the words mean, but it will do better if it understands the language to know that this “i” must be an “e” since there is no such word otherwise.

      Obviously, this is not a job for an expert system. Nobody can effectively describe the rules they use in their head to read handwriting. So we must use a self-training model.

      Generally, if you have a 100 pages of handwritten English text all written by one person and the typewritten equivalents, a good self-training AI should be able to train itself to read that particular person’s handwriting.

      And if you get 20 different people to write the same text, then the AI should be able to do a good job of generalizing its model to read not just those 20 different people’s writing, but almost anybody’s, except your doctor’s. (For your doctor, you’ll still need to get your pharmacist to read it.)


      Trying an English Document

      Well, let’s see how well Transkribus does. I took part of a letter from my great-grandfather’s homestead application in 1906. (Click on image to get full sized):

      2024-02-07_12-55-56

      I selected “English Handwritten” and used it’s default AI model “The English Eagle” and in only about 30 seconds, it gave me this:

      image

      I included red underlines for Microsoft suggested incorrect spellings. I’d say Transkribus did an excellent job, and when comparing even the red underlined words, you’d have to say Transkribus did usually produce what seems to be handwritten.

      It had trouble with the edits on the page, and interpreted the “it is not” inserted at an angle at the left as “Goedener”. It missed the inserted “the” in “spirit of the law” in the line before. And the most important word (my great-grandfather’s surname “Focshaner”) was inserted with a caret in “as if he ^ were in any way”, but the surname was missed and the “he were” became “herwere”.

      So that’s how a page of handwritten English text can get transcribed. It did a good job on a good quality document with relatively neat handwriting. You could do as good a job yourself if you are able to read handwriting, and you could then use Transkribus to help you decide on the words that are more difficult. I am somewhat impressed.


      Trying an English Genealogy Document

      But we’re genealogists. Our documents to interpret are not as simple as a well-written page of text. Our documents are mostly forms and we need help getting names, places, dates and notes from them.

      Let’s try this Homestead Inspector’s Report, also for my great-grandfather. This is more typical of one of the “good quality” documents a genealogist deals with:

      2024-02-07_13-25-11

      The option selected again was “English Handwritten”. Supposedly only the handwriting was to be interpreted. But it gave me this:

      image

      I’ll let you compare for yourself, but I was quite disappointed with these results. They are just a bit too far away from correct to be useful.

      Transkribus may have other English models that might do a better job, or you can train one yourself. I think this result reflects my current impression of how much further AI has to go with regards to reading handwriting. But it’s a start.

      I don’t need a program to read English handwriting for me. For the few documents I have, I am able to do it quite well myself because I understand English and know how to handwrite in English and read English handwriting.


      Any Chance At All for Russian?

      All 9 of my and my wife’s grandparents (the extra is my father’s stepfather) came to Canada in the early 1900’s, two from Romania, and seven from what was the Russian Empire and now is Ukraine. All of their birth documents and their ancestors and family documents are written in Romanian or Russian.

      :Let me concentrate on the Russian documents. These are all from 1910 or earlier and mostly include Birth, Marriage, Death and Revision List (i.e. Census) records and all the text is handwritten onto forms. Just over 2 years ago, I took a wonderful Salt Lake Institute of Genealogy (SLIG) Course on Researching Russian Genealogy Records, which made me do the valuable task of learning the Russian alphabet as a prerequisite.

      Theoretically even though the alphabet is Cyrillic rather than Latin characters, an AI program trained on these documents should do just as well converting handwriting to text whether in Russian or in English. The quality of the handwriting would be the biggest consideration in any language.

      Melanie McComb pointed me to an article that lists 3 public AI models for Russian Handwriting that could be used with Transkribus. The one called “Russian Handwriting Early 20th Century” seems most appropriate for my documents since the Russian alphabet had extra letters and the language was somewhat more complex before the Russian Revolution.

      Well lets go all in and try it.

      Here, for example, from JewishGen is the marriage record of my wife’s great-grandparents Moshko Furman and Charna Rushaylo in Zhitomir in 1886.

      2024-02-07_14-17-15

      To be honest, I don’t give the AI much hope.

      Even so, I go over to the Russian Handwriting early 20th century model page, and I upload my document.

      Well it did give me something, actually more than I expected. And when I throw this into Google Translate, I get:

      image

      Unfortunately, there isn’t much in the translation that’s recognizable.

      The names of the bride and groom that are at the left of the record weren’t even interpreted, probably because they were heavily underlined in ink, something done a lot in Russian records. Those may have obscured the names from Transkribus.

      Also, old Russian handwriting tended to split words in two at the end of a line without a hyphen or any indication that the word is split. That really does a number on Google Translate’s results.

      Here’s how JewishGen indexes the record:

      image

      If I take the text of the comments:

      Groom - townsman from Lipkany, Khotinskij uezd; 1st marriage. Bride - townswoman from Chudnov, Zhitomirskij uezd; maiden (1st marriage).  

      and I use Google Translate to convert it to Russian:

      Жених – горожанин из Липкан Хотинского уезда; 1-й
      брак. Невеста — горожанка из Чуднова Житомирского уезда; девица (1-й брак).

      And then I change the Russian type font to a Russian handwriting font:

      image

      And then I throw that text back into the Russian model of Transkript, I get … unfortunately this:

      image

      I’m very surprised. That’s just about the best-written Russian handwriting you’ll ever find. I tried it on the other Russian models on it as well, and no-go.


      Conclusion

      It’s going to be a while yet before any AI tools will be able to interpret handwritten genealogy documents for us, especially those from before the 19th century in the Cyrillic alphabet.

      For now, we’ll have to continue to rely on our foreign-language researchers who have spent years reading those documents, and can use their experience to understand them and to even find them for us in the first place.

      Eventually, an AI model might be able to be trained for a particular type of document, such as the Russian marriage document I tried above. But it will take someone with the expertise, time and patience to do it.


      Followup March 3:

      I had two suggestions on Twitter with regards to this article:

      1. Try Ocelus by Teklia.
        It can accept Russian handwriting and output the corresponding Russian letters. But for my test documents, when the output is copied to Google translate, not enough words are correct to be of use.
      2. Try Yandex by Iron Hive, a Serbian company. This is a set of tools designed to help with Russian documents. There is a Yandex Vision OCR tool that includes support for Russian and English handwriting recognition. But this seems to be a paid service for programmers and I don’t see a simple way to try it with an uploaded document.