Login to participate
Register   Lost ID/password?

Louis Kessler's Behold Blog

Back From Vacation - 4 days, 22 hrs ago

I had a wonderful two week vacation with my family at Disney World in Florida. I don’t know if I’d call it relaxing, because it was non-stop go-go-go. Here’s one of the many amusing items from the Haunted Mansion for my genealogy friends:


So it’s back to try again to finish version 1.3 of Behold and to try to do it before I head back to Florida in a month’s time for the IAJGS conference at the Walt Disney World Swan Resort. If you’re planning to go, be sure to sign up for the 90 minute Computer Workshop I’ll be giving on Double Match Triangulator:

Monday, July 24
Session Code Session Title Speaker(s) Room Type of Session
8:15 AM - 9:45 AM
2314 Computer Workshop: Using the Double Match Triangulator for Autosomal DNA Analysis Louis Kessler Swan 8 Computer Workshop

Until then, I’ll also be a bit distracted. While away on vacation, Sorin Goldenberg on the Jewish Genealogy in Romanian Moldavia Facebook group located over 40 original Romanian birth/marriage/death records that extend my father’s mother’s family back another 2 generations. As genealogists, you can understand my excitement. It’s like a rainbow appearing with a new discovery at its end, just as we saw one evening in Animal Kingdom:


I will be analyzing this material in detail prior to the Conference. I’m sure a couple of blog posts will result.

Getting Carried Away - Sun, 21 May 2017

I’ve noticed it’s been almost 2 months since my last blog post, and that’s too long. I kept delaying my posts with the hope and expectation that my next one would be announcing the release of Behold 1.3. 

However, the changes to Behold have been taking longer than I hoped. With Spring bringing beautiful weather and other Spring duties, there is less time during the day for programming than in the Winter. Programmers can sometimes turn into depressing people who hope for miserable weather and rainy days so they can get more work done.

Also, I might have been getting a little “carried away” with what I’m trying to get into this version of Behold. I do want Behold 1.3 to finish off everything I need/want in the Everything Report prior to adding GEDCOM export and then Behold’s own database and editing.

Below are the things I’m trying to sneak in:

Highlighted Birth/Maiden Names

I wanted birth/maiden names to be highlighted somehow.  And I wanted that highlighting everywhere. I decided on bolding the birth/maiden name.

This was trickier than it sounds because the person’s name is a hyperlink to that person in the report. Breaking up the styling of the name breaks the hyperlink into three parts. I had to figure a way to break the styling but leave a single hyperlink. It’s different in the Everything Report, in the Treeview, in the HTML export and in the RTF export.



This required a change to the Index of Names. Previously I was using bold text to show the earliest people in each line (those without parents attached). I needed another representation for this and decided on the asterisk (*) before the name. And then, while doing that, why not in add the person’s birthplace to make it easier to identify people:



Section Header Information

I want the section headers to give some information about the numbers of people included as well as information about the amount of pedigree collapse.



Fact/Event Selection and Filtering

Behold has always allowed selection of which Tags you want displayed. On the Tags page of the Organize window, there was a box you could select or deselect if you wanted a certain tag included or excluded from the Report. Unfortunately, this never worked perfectly because tags could occur at different levels, i.e. within other tags, and this mechanism did not work for the Place Details or Source Details section. In other words, you couldn’t just get a listing of all your sources for, say, Census facts.

To fix this situation, there will now be checkboxes only beside the tags which at at Level 1 in INDI (individual) or FAM (family) records. Those will now be counted on the Tags page in their own “Facts” column.


This now allows you to display only the facts you want. For example, you can select just BIRT, MARR and DEAT tags if you want to just show the vital statistics for everyone and see only your vital statistics sources in the Source Details section. You could select just CENS for just the Census facts. You can select BURI to effectively give you a burial list in the Place Details section.

To make selection easier, at the right of the Tags page, I’ve added Def (Default) and None checkboxes. By checking “None”, you can uncheck everything and just add the few facts you want to show. By checking “Def”, you can show all the most important facts again and check or uncheck any others as desired.

These can then be saved into a Behold file with the “merge into” button and retrieved again with the “Merge from” button. So you can set up Behold files for “Vital Stats”, “Census Only”, and “Burials” and quickly switch between them.



DNA Features

I want/need some DNA features that I don’t see readily available in other programs. Behold is going to tell you all the ways each person is related to your starting people, their probability of sharing autosomal DNA, their expected shared autosomal DNA if they share, the same for the X chromosome and whether they share Y-DNA or mt-DNA. For all furthest-back ancestors of the starting people, their Y-candidates or mt-candidates would be listed. Those are the people alive today you can test to get that ancestors line. And inversely, for every living person, all the furthest-back ancestors who they would be Y or mt-candidates for would be listed. I don’t have a final mockup of this yet, but I’m thinking of something like this for every person:



Cheat Sheet

Well, that’s what I call it. It’s something I use in my research all the time. My first scan of any family information (e.g., an archive, book index, online site) would be to look for matches from these two alphabetically ordered lists:

  1. All ancestral surnames and the furthest-back ancestor of each one.
  2. All ancestral birth places and the furthest-back ancestors of each one.

They will be optionally shown just after the Table of Contents. I’m still finalizing how they’ll look and what they’ll contain.


All of this is all almost ready. Lots of little details to finish, but I thought it important to post my progress here and now and not let you think I’ve vanished from the face of Behold development.

Raw Data Comparison: FamilyTreeDNA vs MyHeritage DNA - Tue, 28 Mar 2017

Before I leave DNA and get back to Behold for a few weeks, I had one more set of results I wanted to report on.

A couple of weeks ago, I compared my MyHeritage DNA ethnicity results to my FamilyTreeDNA results, and also compared my match results.

There was one other comparison I had wanted to do. It’s to compare the Raw Data files of the two companies. My questions are:

  1. How similar the raw data downloads are.
  2. Do the differences significantly affect match results.
  3. Do the crossover points of segment matches significantly change.


Downloading Your Raw DNA Data

To download your raw data from FamilyTreeDNA, go to your Dashboard and click on “Download Raw Data”


On the next screen, select “Build 37 Raw Data Concatenated”

At MyHeritage DNA, it is not quite as obvious. Originally, I couldn’t find it and assumed you couldn’t download your data there, until I was shown how. What you do is go to your Manage DNA kits page, click on those 3 dots on the right, and select Download.



Comparing the Raw Data Files

The two companies, FamilyTreeDNA and MyHeritage both use the same DNA testing company Gene by Gene, Ltd. in Houston, Texas. In fact, Gene by Gene is the parent company of FamilyTreeDNA. MyHeritage chose Gene by Gene to be their lab, and Gene by Gene accepted the offer even though you could imagine MyHeritage DNA to be a competitor to Gene by Gene’s FamilyTreeDNA. I’m sure Gene by Gene must have thought it better to get MyHeritage’s lab business than to let them go off to some other lab. Even if this was a financially-based arrangement, it’s still nice to see a little bit of cooperation here between genealogy companies, just like it is to see FamilySearch’s partnership with MyHeritage and Ancestry and FindMyPast to share resources.

Given that it is the same lab doing the test, one would naturally expect the the lab results to be quite similar. I downloaded my two datasets and put them in one spreadsheet to compare them. They had exactly the same format. Here’s the first few lines of the two files side by side:


Think of RSID as the name of a particular position on a chromosome. The Position is in base-pair (bp) units from the beginning of the chromosome and is the information that Double Match Triangulator shows in its output. The result is one of the allele’s (A, C, G or T) from each parent at that location.

The data from the two companies both had 702,442 lines for chromosomes 1 through 22 with identical RSID, Chromosome and Position, and the entries of those were in the same order in each file, ordered not by RSID, but by Position. Having the first three fields matching exactly is a very good thing. They indicate that these download files of MyHeritage and FamilyTreeDNA are both using the same RSID definitions which are defined in what’s called a “Build”.  FamilyTreeDNA allows you to download Build 36 or Build 37. MyHeritage only allows the download of Build 37, so I’m comparing Build 37 here.

FamilyTree DNA gives a FAQ page: How do I read my Family Finder raw data file? In that FAQ they give the following useful table for interpreting the results:


I’m not sure why the table only lists two of the heterozygous values. There are 4 more:  AC or CA, AT or TA, CG or GA, and GT or TG as you’ll see in the tables I created below. There were no insertion or deletion values in either of the downloads.


Comparing Autosomal Chromosomes 1 to 22

Comparing the Results field for those 702,442 values on chromosomes 1 to 22 gives for me the following counts:


578,890 (82.41%) of the entries (light green) match exactly.

FamilyTreeDNA does a nice thing and in their download shows the allele values of each pair in order alphabetically. So it only lists CT and not TC, only AG and not GA.

MyHeritage is not so nice. They show some of the pairs in the other order, with the higher alphabetical allele listed first. They do this for GC, TA, TC and TG (counts shown in dark green). And they show GC both ways, also as CG, and TA both ways, also as AT. Doing this makes me worry that there may be some third party tools that assume the order of alleles is one way or the other. If they do, they could present erroneous results from MyHeritage’s raw data. 100,898 (14.36%) of MyHeritage’s allele pairs match FamilyTree but are shown in the opposite order.

The FamilyTreeDNA table from their FAQ says that the double dash “—“ represents results that were not clear. They say this happens for a small percentage of the microchips. Well, 17,661 (2.5%) of the MyHeritage results are “unclear”, and 19,850 (2.8%) of the FamilyTreeDNA results are “unclear”. Of these, both companies agree that 14,899 (2.12%) of the pairs are “unclear”. At least they agree on most of them.

So up to now, we have 82.41% + 14.36% + 2.12% = 98.89% of the allele pairs matching between the two sets of raw data. That means we have a little over 1% that do not match. We are seeing what is the error rate between two different samples from the same person that are analyzed by the same lab. I don’t know the technical details as to how the companies determine the raw data from the samples, so I can’t speculate as to the reasons for the differences.

Breaking down the differences:
For 2,762 (0.39%), FamilyTreeDNA found a pair, but MyHeritage was unclear.
For 4,951 (0.70%), MyHeritage found a pair, but FamilyTreeDNA was unclear. 
For 42 (0.01%), both companies found a pair, but the pair differed.


Build 36 versus Build 37

FamilyTreeDNA currently uses Build 36, not Build 37 when matching segments between people.As Gerrit van der Ende wrote: “A Build is a Genome assembly. As more is learned about the human genome, new Genome assemblies are released.”

The Chromosome Browser at FamilyTreeDNA, and the Chromosome Browser Results file you download from FamilyTreeDNA has positions based on Build 36. Build 36 had a few more RSIDs (702,457 for chromosomes 1 to 22 versus 702,442 for Build 37). There were 15 RSIDs deleted. Here is the beginning of my Build 36 download from FamilyTreeDNA:


Compare this to the Build 37 at the beginning of this article. The RSIDs are the same and the Results are the same, but all the Positions are different. The positions are not important for matching. Only the order of the RSIDs and the Results are important for matching. There were only 100 or so RSIDs that had a slight order difference, so different builds can be relatively easily translated into each other and matched against each other. What will be different between Builds are the Positions of the matching segments and the size of the segments.

GEDmatch, like FamilyTreeDNA, uses Build 36 for its comparisons. But 23andMe uses Build 37. So you can’t compare exact positions in Double Match Triangulator that were computed for FamilyTreeDNA or GEDmatch files with those computed at 23andMe..

MyHeritage’s positions in its raw data are all matching FamilyTreeDNA’s positions from the latter’s Build 37 download, so MyHeritage’s raw data is Build 37. I will not be able to tell whether their matches are Build 37 until MyHeritage provides a segment match download or a utility like a chromosome browser that shows segment match results. However I would guess, since they are a new company, they would use Build 37 matches, making their Positions compatible with 23andMe.

FamilyTreeDNA and GEDmatch are sort of stuck. They put together a matching system based on Build 36 and they’d have to remap all the results if they went to Build 37 for their matching. It would change the positions, but likely not change the match results significantly. That’s a lot of work for little gain, so I can see their reluctance to make the change.

Comparing Build 36 to Build 37 gives almost all the mapping that is needed. If it becomes important in the future for Double Match Triangulator, I see that I’d be able to do the mapping and present FamilyTreeDNA, GEDmatch, MyHeritage and 23andMe results all with comparable Positions, either Build 36 or Build 37.


Comparing the X Chromosomes

Doing the same comparison for the X chromosomes shows more differences between FamilyTreeDNA and MyHeritage DNA than chromosomes 1 to 22 did:


First of all, MyHeritage is missing 16 of the RSIDs that FamilyTreeDNA has. This wasn’t a problem for chromosomes 1 to 22 which matched exactly.

Then, if you look again at the FAQ above, you’ll see it says that for men who only have a single X chromosome, the one allele will be doubled, allowing only AA, CC, GG and TT. This is my raw data file, and I’m male. But the results show 46 combinations that include AC, AG, CT/TC and GT/TG. Those all have to be incorrect and I’ve marked them such.

And instead of only about 1% of the results where one company found a pair and the other was unclear, we are now up to over 5% of the X results being “unclear” for one of the companies, and another 641 or 4% being “unclear for both”. That means that about 9% of the X chromosome results are unknown or unagreed upon by the the test results that Gene by Gene produces from two DNA samples of the same person.

If 9% of the X chromosome results are missing or wrong, then for two people. 18% of the locations may be wrong between them. What effect might this have on X chromosome matching?


The Y chromosome

I was very surprised to see that the MyHeritage DNA raw data includes the Y chromosome. FamilyTreeDNA does not. So I can’t compare the two. All I can do is report on the Y results of MyHeritage DNA:


Again, there is only one Y chromosome, so according to the convention, the allele should be doubled. We see that only 60% of the 481 RSIDs have valid values of AA, CC, GG or TT.

Even without the FamilyTreeDNA raw data for the Y to compare with, the MyHeritage DNA raw data does not give much confidence regarding the accuracy of the Y chromosome interpretation as far as single allele processing goes. MyHeritage does not yet make report any results based on the Y chromosome, but they should double check this before they do.


Comparing Match Results at GEDmatch

The question now is whether these differences affect match results. One way to check this is to upload both files to GEDmatch.

Doing a One-to-one compare of the two files shows just 22 matches – one match for the length of each pair of chromosome. GEDmatch uses 3587.0 cM as the size of the 22 pairs, and that’s exactly what the One-to-one compare gives. GEDmatch must somehow filter out the 1% mismatches in its comparisons, which is good.

Comparing the 2 me’s to my uncle gives very close results. Out of 61 matching segment, one start location and one end location are a bit different. The total matches using the FamilyTreeDNA raw data is 2,006.4 cM and using the MyHeritage DNA data is 2,005.9 cM. Both give a largest segment of 88.3 cM.

For a more distant relationship, such as my 3rd cousin, the results are almost the same with only a few small differences:


It does appear that even though there might be what appears to be a significant number of differences in the Raw Data files, they do not have a significant effect on the matches and only affect a few of the starting and ending locations, but not by much.

Checking out the X Chromosome and spot checking a few of my closest X matches, the results are similarly close, and X matching is not significantly affected.


Comparing Match Results at FamilyTreeDNA

As a double check, I uploaded the MyHeritage DNA raw data into an account at FamilyTreeDNA. My original FamilyTreeDNA test give me 9860 matches. The MyHeritage raw data gives me 9724 matches.

Of those, the cM total matches changed for 3717 of them, but the largest change was only 7.9 cM with the FamilyTreeDNA raw data giving a match of 107.1 cM and the MyHeritage DNA raw data giving a match of 99.2 cM. For this extreme case person, here is the comparison:


FamilyTreeDNA includes 2 segments of 2.37 cM and 3.21 cM that MyHeritage doesn’t, and one segment has a different start location. So even in this extreme case, the differences are not major.

Only 114 of the longest segments of the matches were different, with the largest difference being 3.6 cM that reduced a 16.4 cM longest segment down to 12.8 cM.

Again, this confirms that the differences in the Raw Data files do not have much of an affect on the match results.



  1. Comparing the raw data from FamilyTreeDNA and MyHeritage shows that for Chromosomes 1 to 22, there is disagreement or the result is unclear for 1.5% of the RSIDs. On the X chromosome, that percentage rises to 9%. On the Y chromosome, the percentage rises to 40%.
  2. These differences do not seem to have a significant effect on match results.
  3. A small number of start and end locations of segment matches may be different. This is worthy to note when I start getting Double Match Triangulator to analyze crossover points, but likely wont cause problems.

The raw data is more different than I expected it to be, but I’m very happy that it will make little difference to the match results.