Login to participate
  
Register   Lost ID/password?

Louis Kessler's Behold Blog

Probability of No Autosomal Segments Matching - Mon, 19 Dec 2016

Back to Behold, but still DNA.

I am adding some DNA features to Behold that I know I need and are not in any genealogy programs currently out there.

Basically, I want to know the expected (i.e. mean) amount of autosomal, X, Y and mt DNA that each person will share with main person (or people) selected for the family. This is a centimorgan (cM) amount. It is straightforward to figure out, since the expected autosomal amount gets halved every generation, Y and mt only get passed through the male and female lines respectively, and X, although slightly more complicated, is manageable with females getting all their father’s and half their mother’s and males getting half their mother’s.

In addition to that, I want to know the probability of no segments matching. This is important, because if you have a 5th cousin, and you know that there’s, say, a 50% chance that they will not match at all, then you should only expect that half of the 5th cousins that DNA tested will match you somewhere. And fewer than half of them will show up as matches with your DNA testing company because the companies have a minimum match criteria before they claim two people match, and they need to do that to prevent too many false positive random matches.

I took a look to see if I could find the theoretical probabilities that I needed. I found at the ISOGG page on Cousin Statistics two tables:

I found it very interesting that these two tables give the same information but with slightly different numbers. For instance 4th cousins are 9 generations (DNA-wise) apart sharing on average (1/2)^9 = 1/512 of their DNA. And a person with their 7xgreat grandparent also shares 1/512 of their DNA. But the 1st table gives 30.70% for 4th cousins, and the second gives 37.43% for 7xgreat grandparents. I would have thought these two numbers should be the same, and I can’t check the original article these were derived from because I’m not a PubMed author and don’t know any PubMed author’s who can invite me.

None the less, the numbers in these tables are reasonably close to each other. So now I just need a method to calculate them for any degree of generational distance. I love when I get to do something statistical which was part of my education and my work. Not too often have I had to use my statistics education for genealogy, so here’s my chance.

Let’s go to Jim Bartlett’s blog post: Crossovers by Generation. Take some time to read it and learn something like I did. I’m going to reproduce Jim’s Table 3:

05D Figure 3

The important columns are the one’s marked “Segments” and “Number of Ancestors”. Because there are on average 34 crossovers per generation, the number of segments grows linearly, 34 per generation. But the number of ancestors is growing exponentially, doubling every generation. After 9 generations, there are more ancestors than segments. By generation 13, there are 8192 ancestors, but only 465 segments. That means at most only 465 out of those 8192 ancestors will match, and that’s if none of them match on more than one segment. That already tells you that at least (8192 – 465) / 8192 = 94.32% of your 13th generational relatives will not match you.

Now let’s use some statistics. The statistical probability of no segments matching given that there are N ancestors and S segments is: 

(1 – 1 / N) ** S

What that says is that for generation 13, there is a 8191/8192 chance of a person not matching in one segment, and the non-match has to be in all 465 segments. Calculate this out and it comes to 94.48%.

Let’s do that for a bunch of generational levels and compare that to Table 1 and Table 2:

image

Hmmm. Not too bad. In fact the Statistical calculation comes very close to the Table 1 numbers. So close, that when I plot the three sets of values, you see a  small difference only with the Table 2 numbers but the other two are right on top of each other.

image

Excellent. So now I have validated that these numbers are close enough and that I can therefore use them.

One last thing left to do. The mean amount of autosomal DNA passed down is always halved each generation. On average, that means with a 13 generational difference, the expected DNA shared is 1 / 8192 = 0.01% which would work out to just 1 cM. That’s an awfully small match to be detected.

But that average includes all the ancestors who don’t match at all. We know that 94.48% or 7740 of the 8192 do not match. Better is to show the expected DNA matching when the two people do match. This would then be just 1 / 450 = 0.22% which would work out to an expected average match of 15 cM for the 450 people 13 generations apart that do match.

Let’s try this for the whole range of generations::

image

So now I have what I need. Behold is going to show:

  1. The probability of having a DNA match (e.g. 5.52%)
  2. The average match length if they do match (e.g. 15.0 cM)

Let me of course add 100 caveats. These are approximate values. The actual percentages may vary. Matching cM may vary greatly, etc., etc., blah, blah.

Running DMT Against Non-Matches - Tue, 13 Dec 2016

I happened to come across a post by Robert Davis on the Ulster Co NY Y-DNA group at FamilyTree DNA.

Robert said:

Double Match Triangulator …would be of use in finding links (common matches) between two individuals that are not themselves matches. and hence the ICW tool of FF is of no use.

That was an excellent observation. And don’t talk much about non-matching people in my writeup on my DMT page or in its help file (although I do include one non-matching person in the sample files.). Using DMT on non-matching people that are possible relatives is something that you will want to do.

Why is this so? Well, it’s simply a matter of probabilities. Once you get down into 4th, 5th and 6th cousins, there is a good chance that your cousin will not reach the threshold where they will make it into you match list. Either their longest match in common does not meet FamilyTreeDNA’s threshold, or the total cM length of the common matches does not meet the threshold.

However, you may find that this person’s sibling or parent does match. Therefore you know they are related, but FamilyTreeDNA gives you no tools to check that.

So DMT to the rescue.

When you compare your Chromosome Browser Results file to someone who does not match you, you will not get any Full Triangulations. You will only get Double Matches with a Missing a-b segment. That’s okay. Go ahead and analyze those. They won’t be the same segment passed down from a common ancestor, but they could very well be two different segments from a common ancestral line. See Triangulation and Missing a-b Segments.

In fact, you may have hundreds or thousands of people who match you. Every one of those is a candidate to be Person b in your DMT runs, and you should see if they are willing to download and let you use their CBR file. But their siblings, parents, and cousins on the related side are also candidates as Person b. If you can, ask for any of the CBR files that they administer. Of course, tell them that you’ll keep their information private and not give it or disclose it to anyone, and tell them that you’ll let them know what you find.

I have received 63 Chromosome Browser Download files from possible DNA-relatives of my uncle. 37 of them show up in my uncle’s match list. Of the other 26, all but 3 have significant Double Matches with my uncle.

image

Take a look at the above table, which is my People file produced when I use DMT to run all 63 people against my uncle. Column A has my uncle. The yellow names at the top are some of the 26 people who don’t match my uncle who I have CBR file for. The people in Column B are the 37 people who my uncle matches to.

The values shown do not have any Triangulations. Those would show up in green and the numbers would be preceded by “T” instead of “D”.

But you can see some very significant Double Matches, such the 24.95 cM matches at the top left between Harry and Erika and Harry and Steve and Harry and Mark. You’ll also see some very useful X-Chromosome matches that are shown in red. When the numbers are the same, it is very likely they’re referring to the same segment, but you’ll have to check the Map page to be sure.

Notice Andrew’s column. He is one of the 3 that don’t Double Match anyone and can be presumed to be a non-relative. Negative information like that can also be valuable.

So I wanted to point this out and write my thoughts down before I forget. Using DMT for non-matches is yet another way that DMT can prove to be useful.

Double Match Triangulator Version 1.3 - Mon, 12 Dec 2016

With the #RootsTech #InnovatorShowdown coming, and with DMT entered, I wanted to get one last update to DMT in while I could. You can get the new Version 1.3 here.

My realization of DM Theorem 1 (and Corollary 1) made me want to change the DMT overlap detection algorithm somewhat. Now each Triangulation Group would only be made up of Triangulated segments. Overlapping Double Matches would be put into their own Double Match Group.

So none of this (where the column in the middle mixes the green Triangulations  with the white Double Matches):image

But this instead:
image

which you can easily see is a significant change combining what previously were two separate Triangulation Groups into one as they should have been.

The reason they were previously separated was that a Missing a-b (non-triangulating match) did not overlap with the previous Triangulation endpoint, causing a break. Now that these are known to be on separate halves of the Chromosome, one should not cause a break in the other. This gives an immediate improvement to the determination of the Double Match Groups.

Missing a-b segments still might occur on both halves of the Chromosome. This algorithm can’t solve for that situation. But I’m hopeful that analysis of the breakpoint addresses might ultimately sort that all out. That methodology will likely have to be worked in many small little steps. So that can be a project for later next year, after RootsTech.

The other major change in Version 1.3 was to the People page. I hadn’t spent much time working with that page. So I only had it in a very plain text format:

image

But I had a lot of interest in it from various people, and found a few minor problems in it to fix. I added the Status column and now sort the matches so the people who Triangulate are first. And I made it look much nicer as well:

image

DMT now seems to work well and is stable, and I think I can wrap it up for the time being, and see how it’s received at the Innovator Showdown and RootsTech.

I need to take a break from DMT to get back to Behold. In my Triangulation and Missing a-b Segments post from August 30, I said “First to reassure you, I am back working towards finishing Behold Version 1.3”.  After I said that, I really did work on Behold for a few weeks until I decided that I should enter DMT into the Innovator Showdown. That has taken my spare time up to now.

Okay. That’s done. I’ve still likely got a few weeks before my own DNA test results come back from FamilyTreeDNA, so at the moment I’m not distracted by that. Let’s see if I can finish off Behold Version 1.3 prior to RootsTech. I’d really like to because one of the major additions are some DNA information that I’ll be adding that I don’t believe any other program has. I’ll be talking about this in a future blog post. I’ve got the framework already into my development version and now that I have more time each day to work on Behold than I had previously, I should be able to make good progress. Once that’s done, then the Everything Report will be complete and have everything needed.