Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Can Artificial Intelligence Read Russian Handwriting? - Wed, 7 Feb 2024

There’s been a lot of talk the last year or so about the use of Artificial Intelligence for Genealogy. I’ve basically taken a laissez faire wait-and-see attitude towards it. Most of the applications of AI for genealogy are designed to save you time, maybe by drafting out a biography for you or doing image creation, repair or animation.

But I’m looking for something that can help me, and help me specifically with regards to one particular task. The task I’m interested in is reading handwriting – not just any handwriting, but the handwriting in Birth, Marriage, Death and Census records from the Russian Empire.


Transkribus - AI to Read Handwriting

I was made aware by Jarrett Ross’s post on Twitter a week ago of an online program called Transkribus.

Transkribus describes itself as:

“an AI-powered platform for text recognition, transcription and searching of historical documents – from any place, any time, and in any language.”

You upload your handwritten document. You select one of their public models for different languages and time periods. If the public models don’t serve your needs, you can train a custom model. They supply an introductory video on Getting Started with Transkribus.


Artificial Intelligence Model Types

Reading handwriting is a difficult problem, but is something that Artificial Intelligence should one day be able to handle.

I classify AI as one of two types:

  1. Expert systems
  2. Self-training models

An expert system is one where you as a human tell a program exactly how to do every step of a process. The program does not learn anything on its own, but the result can seem to be very intelligent and be completed faster and more accurately than any human can.

A self-training model is one where you give a general AI program lots of different problems to be solved along with the answers to each problem. You let the program itself work out how best to generalize the problem and produce a solution for it.

AI can be expert systems, self-training models, or a combination of the two.

An example is a chess program. The first programs were all expert systems. All the rules were written by the programmer. In 1997, Deep Blue became the first chess program to beat the world champion who was Kasparov at the time. This program was an expert system, but with hardware that made calculations very fast.

However, self training systems can do better. In 2016, a chess program called AlphaZero was developed that was trained solely via self-play for just 9 hours. It then defeated Stockfish, which was at the time the strongest chess program, and it won with an amazing score of 28 wins, 72 draws, and zero losses.


What is Involved in Reading Handwriting?

The goal with handwriting recognition is simply to transcribe the handwriting into text. No translation is required. We are just looking to have each handwritten letter, number or symbol converted to the correct text. And with as few mistakes as possible.

There are already many good translation tools available (e.g. Google translate) so if the handwriting is in a foreign language, the transcription should correctly translate. The program to read the handwriting and create a transcript need not translate it or understand what the words mean, but it will do better if it understands the language to know that this “i” must be an “e” since there is no such word otherwise.

Obviously, this is not a job for an expert system. Nobody can effectively describe the rules they use in their head to read handwriting. So we must use a self-training model.

Generally, if you have a 100 pages of handwritten English text all written by one person and the typewritten equivalents, a good self-training AI should be able to train itself to read that particular person’s handwriting.

And if you get 20 different people to write the same text, then the AI should be able to do a good job of generalizing its model to read not just those 20 different people’s writing, but almost anybody’s, except your doctor’s. (For your doctor, you’ll still need to get your pharmacist to read it.)


Trying an English Document

Well, let’s see how well Transkribus does. I took part of a letter from my great-grandfather’s homestead application in 1906. (Click on image to get full sized):

2024-02-07_12-55-56

I selected “English Handwritten” and used it’s default AI model “The English Eagle” and in only about 30 seconds, it gave me this:

image

I included red underlines for Microsoft suggested incorrect spellings. I’d say Transkribus did an excellent job, and when comparing even the red underlined words, you’d have to say Transkribus did usually produce what seems to be handwritten.

It had trouble with the edits on the page, and interpreted the “it is not” inserted at an angle at the left as “Goedener”. It missed the inserted “the” in “spirit of the law” in the line before. And the most important word (my great-grandfather’s surname “Focshaner”) was inserted with a caret in “as if he ^ were in any way”, but the surname was missed and the “he were” became “herwere”.

So that’s how a page of handwritten English text can get transcribed. It did a good job on a good quality document with relatively neat handwriting. You could do as good a job yourself if you are able to read handwriting, and you could then use Transkribus to help you decide on the words that are more difficult. I am somewhat impressed.


Trying an English Genealogy Document

But we’re genealogists. Our documents to interpret are not as simple as a well-written page of text. Our documents are mostly forms and we need help getting names, places, dates and notes from them.

Let’s try this Homestead Inspector’s Report, also for my great-grandfather. This is more typical of one of the “good quality” documents a genealogist deals with:

2024-02-07_13-25-11

The option selected again was “English Handwritten”. Supposedly only the handwriting was to be interpreted. But it gave me this:

image

I’ll let you compare for yourself, but I was quite disappointed with these results. They are just a bit too far away from correct to be useful.

Transkribus may have other English models that might do a better job, or you can train one yourself. I think this result reflects my current impression of how much further AI has to go with regards to reading handwriting. But it’s a start.

I don’t need a program to read English handwriting for me. For the few documents I have, I am able to do it quite well myself because I understand English and know how to handwrite in English and read English handwriting.


Any Chance At All for Russian?

All 9 of my and my wife’s grandparents (the extra is my father’s stepfather) came to Canada in the early 1900’s, two from Romania, and seven from what was the Russian Empire and now is Ukraine. All of their birth documents and their ancestors and family documents are written in Romanian or Russian.

:Let me concentrate on the Russian documents. These are all from 1910 or earlier and mostly include Birth, Marriage, Death and Revision List (i.e. Census) records and all the text is handwritten onto forms. Just over 2 years ago, I took a wonderful Salt Lake Institute of Genealogy (SLIG) Course on Researching Russian Genealogy Records, which made me do the valuable task of learning the Russian alphabet as a prerequisite.

Theoretically even though the alphabet is Cyrillic rather than Latin characters, an AI program trained on these documents should do just as well converting handwriting to text whether in Russian or in English. The quality of the handwriting would be the biggest consideration in any language.

Melanie McComb pointed me to an article that lists 3 public AI models for Russian Handwriting that could be used with Transkribus. The one called “Russian Handwriting Early 20th Century” seems most appropriate for my documents since the Russian alphabet had extra letters and the language was somewhat more complex before the Russian Revolution.

Well lets go all in and try it.

Here, for example, from JewishGen is the marriage record of my wife’s great-grandparents Moshko Furman and Charna Rushaylo in Zhitomir in 1886.

2024-02-07_14-17-15

To be honest, I don’t give the AI much hope.

Even so, I go over to the Russian Handwriting early 20th century model page, and I upload my document.

Well it did give me something, actually more than I expected. And when I throw this into Google Translate, I get:

image

Unfortunately, there isn’t much in the translation that’s recognizable.

The names of the bride and groom that are at the left of the record weren’t even interpreted, probably because they were heavily underlined in ink, something done a lot in Russian records. Those may have obscured the names from Transkribus.

Also, old Russian handwriting tended to split words in two at the end of a line without a hyphen or any indication that the word is split. That really does a number on Google Translate’s results.

Here’s how JewishGen indexes the record:

image

If I take the text of the comments:

Groom - townsman from Lipkany, Khotinskij uezd; 1st marriage. Bride - townswoman from Chudnov, Zhitomirskij uezd; maiden (1st marriage).  

and I use Google Translate to convert it to Russian:

Жених – горожанин из Липкан Хотинского уезда; 1-й
брак. Невеста — горожанка из Чуднова Житомирского уезда; девица (1-й брак).

And then I change the Russian type font to a Russian handwriting font:

image

And then I throw that text back into the Russian model of Transkript, I get … unfortunately this:

image

I’m very surprised. That’s just about the best-written Russian handwriting you’ll ever find. I tried it on the other Russian models on it as well, and no-go.


Conclusion

It’s going to be a while yet before any AI tools will be able to interpret handwritten genealogy documents for us, especially those from before the 19th century in the Cyrillic alphabet.

For now, we’ll have to continue to rely on our foreign-language researchers who have spent years reading those documents, and can use their experience to understand them and to even find them for us in the first place.

Eventually, an AI model might be able to be trained for a particular type of document, such as the Russian marriage document I tried above. But it will take someone with the expertise, time and patience to do it.


Followup March 3:

I had two suggestions on Twitter with regards to this article:

  1. Try Ocelus by Teklia.
    It can accept Russian handwriting and output the corresponding Russian letters. But for my test documents, when the output is copied to Google translate, not enough words are correct to be of use.
  2. Try Yandex by Iron Hive, a Serbian company. This is a set of tools designed to help with Russian documents. There is a Yandex Vision OCR tool that includes support for Russian and English handwriting recognition. But this seems to be a paid service for programmers and I don’t see a simple way to try it with an uploaded document.

Continuing Education 2023 - Thu, 28 Dec 2023

Last January, the Association of Professional Genealogists  @APGgenealogy started requiring that members report at least 12 hours of Continuing Education each year. I found the task of listing my CE time for 2022 quite interesting and last January I posted what I had done.

Below is my Continuing Education activity list for 2023. Each event was 1 hour unless otherwise noted.

Webinars – Total 25.5 hours

  • Jan 4 – The 5 steps to organizing your DNA in 2023 – Diahan Southard
  • Jan 19 – The Basics of Jewish American Genealogy, Rhonda McClure
  • Feb 11 – 10 Tips of Successful Online/Onsite Research in Ukraine, Russia and Belarus – Alina Khuda, Virtual Genealogical Association
  • Mar 14 – FamilySearch GEDCOM Technical Q&A – Gordon Clarke
  • Mar 14 – RootsTech Recap – Daniel Horowitz, MyHeritage
  • Mar 28 – New Developments of MyHeritage DNA by Gal Zrihen, MyHeritage
  • Mar 29 – Predicting Unknown Close DNA Relationships Just Got Better! Segcm Tool – Andy Lee, Family History Fanatics
  • Mar 29 – The Alex Krakovsky Project – Navigating the Wiki to Locate Town Records, JewishGen
  • Apr 11 – First Steps First: Rootstech Recap – Daniel Horowitz
  • Apr 21 – DNA Roundtable: Relationship Predictors – Leah Larkin (90 min)
  • May 25 – Test. Analyze. Repeat: Long-term DNA Strategies for Success – Diahan Southard.
  • Jun 16 – Finding Your Ancestors in Canadian Land Records – Tara Shymanski, Legacy Family Tree Webinars
  • Jul 14 – Celebrating 2,000 Webinars! plus 10 tips you can use today – Geoff Rasmussen, Legacy Family Tree Webinars
  • Aug 8 – Ten MORE Secrets to Using MyHeritage – Daniel Horowitz, Legacy Family Tree Webinars
  • Aug 23 – DNA Painter Basics: Strategies to Enhance Your Genealogical Research – Adina Newman, Virtual Genealogical Association
  • Oct 2 – Ask the Experts: Katy Rowe-Schurwanz from FamilyTreeDNA – Diahan Southard (30 min)
  • Oct 9 – Ask the Experts: Blaine Bettinger – Diahan Southard (30 min)
  • Oct 23 - Ask the Experts: DNA Painter – Diahan Southard (30 min)
  • Nov 14 – New Updates on Your MyHeritage Family Tree – Uri Gonen
  • Nov 16 – Ask the Wife: A Powerful DNA Strategy – Diahan Southard
  • Nov 20 – Ask the Experts: Michelle Leonard – Diahan Southard (30 min)
  • Nov 28 – The Good News About Historical Newspapers – Daniel Horowitz
  • Nov 30 – Organize Your DNA Matches – Kelli Bergheimer
  • Dec 9 – Ten Awesome Things You Can Do on WikiTree – Connie Davis, Virtual Genealogy Association
  • Dec 12 – The Lastest Developments in Searching Historical Records on MyHeritage – Maya Geier, MyHeritage
  • Dec 15 – Landscape of Dreams: Jewish Genealogy in Canada – Kaye Prince-Hollenberg, Legacy Family Tree Webinars
  • Dec 20 – Got Old negatives? Scan Them With Your Phone and These 5 (Mostly) Free Apps! Elizabeth Swanay O’Neal – Family Tree Webinars

Conferences (Online) – Total 14 hours

  1. Mar 2 to 4 – RootsTech 2023
    • Getting Started in Jewish Genealogy – Ellen Kowitt
    • What’s New at FamilySearch in 2023 – Craig MIller
    • Using DNA to Determine Relationships in 2023 – Beth Taylor
    • How third-party DNA tools can help with your family history research – Jonny Perl
    • Different Ways to Work with Your family Trees – Uri Gonen
    • Tracing Your Jewish Roots in Ukraine – Ellie Vance (30 min)
    • Using Maps and Gazetteers to Locate the Hometown – Ellie Vance (30 min)
    • What’s New in RootsMagic 9 – RootsMagic
  2. Nov 2 to 5 – WikiTree Symposium and WikiTree Day
    • Mastering the Updated Library and Archives Canada Website – Kathryn Lake Hogan
    • DNA Consultations at AmericanAncestors.org – Melanie McComb
    • DNA Group Projects and WikiTree – Mags Gaulden
    • Tech Troubleshooting – What Would You Do? – Thomas MacEntee
    • Keep Your Family’s History Safe for the Future – Marian Burk Wood
    • Reverse Phasing – What and Why? – Kevin Borland
    • Artificial Intelligence (AI) & Genealogy Panel Discussion, Drew Smith, Dana Leeds, Steve Little, Thomas MacEntee, Rob Warthen, Willie

In total, my time for 2023 was 39.5 hours, which is very similar to my 2022 total of 38 hours.


Planning for 2024

Now is a good time to plan in advance your 2024 activities. I like to add them to my calendar as soon as I find something of interest to me that might contain new or updated information.

I plan again to attend RootsTech online from Feb 29 to Mar 2.

image

You can go to their Search the On-Demand Library page and look through their catalog of 4,423 results for over 1,500 sessions from 2019 to 2023 that are still available online. I’m sure you’ll find something of interest there.

They have more than 200 new online sessions planned for 2024. The new session are not yet listed on their site, but they will be soon. When they are ready, you’ll be able to filter your search by year, and 2024 will be an option. Then you can plan the sessions that you’ll want to watch.

Another planning activity to do right now is to check out which Legacy Family Tree Webinars you’ll want to see in 2024. They just came out with their planned classes and they will feature 112 speakers who will be giving 168 talks. That’s almost one every second day. You can find their list and register for the sessions you want here: Upcoming Webinars - Legacy Family Tree Webinars

image

I found 17 sessions that I’m already interested in that I’ve now registered for.

Most of the Legacy Family Tree Webinars are free to watch live. Usually, if you miss the live session, you can still watch it for free for about a week.

Of course, another way to get some Continuing Education is to attend a genealogy conference. I have not attended an in-person conference since before Covid. I was planning to finally go on a Genealogy European River Cruise in October 2024 which was to have featured Judy Russell and Blaine Bettinger as the speakers. I was really looking forward to this, but unfortunately it had to be cancelled. It seems I’ll have to wait a while longer until I find another in-person genealogy conference of interest to me.

Now its up to you to get to it. There’s no time like the present to plan some of your genealogical activities for 2024.

    Testing Out MyHeritage’s New AI Biographer - Tue, 26 Dec 2023

    My previous post ran through MyHeritage’s new AI Record Finder tool. Now I’m going to try out the other tool MyHeritage released today. You can find their blog post describing the AI Biographer here: Introducing AI Biographer™: Create a Wikipedia-like biography for any ancestor using AI, enriched with historical context - MyHeritage Blog


    A Biography by MyHeritage’s Deep Story Tool

    Before I go into their new tool, I should mention that MyHeritage already had a very innovative AI biography tool. It’s available from the Photos menu and is called DeepStory.

    image

    I have tried this tool and it produces very interesting results, if not a little bit creepy, by having your (now deceased) relative being the one telling you his/her own story in a one to two minute video. 

    The main subject of my previous post was my maternal grandfather Joseph German. To make a DeepStory, all you have to do is click on the person’s picture. The tool will first enhance my grandfather’s picture, and then use the facts and records I have entered for my grandfather to create his story.

    Unfortunately, the DeepStory tool seems to have a problem with my grandfather’s photo or maybe his biography and was giving me: “An error occurred. Please try again later”, so for now I’ll use his wife, my grandmother’s story for comparison in this article.

    You can see the DeepStory MyHeritage generated for my grandmother Esther-Beyla (Goretsky) German here: MyHeritage DeepStory, give voice to your family stories with AI technology – MyHeritage – If you don’t like your relative saying things like “I married the love of my life” or “love and kisses”, you can edit the script to say anything you want and then recreate the story.


    A Biography by MyHeritage’s AI Biographer

    So now lets try out AI Biographer on my grandmother and see what this new tool provides. With this tool, you select a deceased person in your tree and there is now an option to “Create AI Biography”. Clicking on that brings up this window:

    image

    DeepStory only uses information from your tree. Whereas AI Biographer’s default is “Enhanced”, which uses external material as well so I’ll go with that.

    image

    It really doesn’t take that long. I got my email notification in less than 5 minutes.

    The biography is sent as a PDF attachment. Here is what AI Biographer sent me:

    image

    The “Consistency Issues” section which compares the facts from the different sources is a really nice touch. I don’t know of too many programs that produce those.


    A Human Written Biography at WikiTree

    Now how does this compare to a human-written biography?  Below is a biography written about my grandmother on WikiTree by the volunteers during My Week as the WikiTree Challenge Guest.

    image

    You can see the above profile for my grandmother on WikiTree here: https://www.wikitree.com/wiki/Goretsky-5


    Conclusion

    All three biographies are very good and although they all use basically the same information, they each tell the story in a their own unique way. Each purposely includes and highlights the information it prefers.

    Of course, the human written biography takes the most work, especially if you are the human doing the writing, whereas the two AI tools can produce a result with little effort on your part.

    The DeepStory tool, being a video, is definitely the most eye-catching, and it allows you to modify the script any way you want.

    The new AI Biographer tool is just in early beta and MyHeritage will likely add enhancements to it over time.

    All in all, there is no reason any more why you shouldn’t be able to have a  biography created for each of your ancestors.