Login to participate
  
Register   Lost ID/password?

Louis Kessler's Behold Blog

Triangulation, Single Matching and Double Matching - Fri, 27 Jan 2017

It seems like my last post was a bit confusing to many people. I expect that the way I drew the boxes (to be segments) and the way I connected them with lines (indicating matching) was not intuitive, and it did not allow people to see that Double Matching with two people actually triangulates.

I’m going to start from scratch here. I’m going to use an illustration that hopefully most people will understand. This will be a representation of FamilyTreeDNA’s  Chromosome Browser which most people reading this should be familiar with.

 

Triangulation

Below is a representation of the Chromosome Browser as seen by three different people when they log in to their account at FamilyTreeDNA. Person a will see the top diagram, Person b will see the middle diagram, and Person c will see the bottom diagram. One person cannot see the diagrams of the others.

image

In the top diagram, Person a’s Chromosome Browser shows a match with Person b and Person c on the same segment.

If you log into Person b’s results, their Chromosome Browser will show a match with Person a and Person c over the same segment.

And Person c’s will see that their Chromosome Browser says Person a and Person b match on the same segment.

This is called Triangulation, where three people all match each other on the same segment. Person a matches Person c, Person b matches Person c, and Person a matches Person b.

The purpose of Triangulation is to help you identify segments that may be Identical by Descent (IBD) because those that are IBD come from a common ancestor of the people who share the same segment. Then through genealogical research, you trace back each of the people’s trees to see where they connect.

For a segment to be IBD, it must Triangulate.

However, a segment that Triangulates is not necessarily IBD. There are a couple of reasons for this:

  1. Two segments may match by chance. This starts happening when segments are shorter than 15 cM and happens more often as the segments get smaller.
  2. Two segments may be on opposite chromosomes. This situation was identified by Blaine Bettinger a couple of days ago on the International Society of Genetic Genealogy (ISOGG) Facebook page. In other words, one segment is the maternal segment and the other is the paternal segment of a chromosome pair.

 

Single Match Triangulation

This is the method most people use for Triangulation. It uses one person’s matches:

image

What you have shown so far by this is that Person a matches Person b, and Person a matches Person c on the same segment. You have not yet Triangulated because you must also show that Person b matches Person c on the same segment. The above Chromosome Browser image does not tell you that. And Person a does not have access to that Person b match with Person c in their own match information at FamilyTreeDNA or at 23andMe.

What Person a can find out from their own account at FamilyTreeDNA or 23 and Me is if Person b is “In Common With (ICW)” Person c. That means Person b shares enough DNA with Person c to be considered a match. If that is the case, then it increases the likelihood that the segments Triangulate, but it does not guarantee it because those matches between Person b and Person c may not be on the same segment. There are several tools that make use of ICW data to help you locate Triangulated segments, such as Don Worth’s Autosomal DNA Segment Analyzer (ADSA) at DNAGedcom.

However, to truly Triangulate, you need to verify that the Person b and Person c segments match each other. The one obvious way to do this is to contact either Person b or Person c and ask them to look in their Chromosome Browser to see if they match the other person over this specific segment. If they do, you have verified that this segment Triangulates between Persons a, b and c, and the segment therefore might be IBD.

If Person b or Person c tell you that they don’t match the other person over this specific segment, then they have verified that Persons a, b and c do not Triangulate over this segment and have shown that the segment cannot be IBD for the three of them.

This is however a lot of work, to verify every segment with every person on a one by one basis if you do it manually.

There is just one tool out there that will check the third match for you. It is the GEDmatch Tier 1 Triangulation Tool. It actually looks at the segments of Person b and Person c to ensure that the same segment matches with Person a. GEDmatch find all the pairs of people who match Person a. So it will display all the Triangulations it finds, whether paternal or maternal and does not differentiate.

 

Double Match Triangulation

This method makes use of two people’s match information. When each they log in to FamilyTreeDNA and look at their Chromosome Browser, Person a will see the top diagram, and Person b will see the bottom diagram.

image

Using just two people’s information, you can truly Triangulate. This is why:

Person a knows of their segment match with Person b, and knows of their segment match with Person c, but does not know if Person b matches Person c on the same segment.

Person b knows of their segment match with Person a, and knows of their segment match with Person c, but does not know if Person a matches Person c on the same segment.

If you put that data together, then you know from Person a that Person a matches Person c on the segment, you know from Person b that Person b matches Person c on the segment, and you know from both of them that Person a matches Person b on the segment. You have the three matches on the same segment that you need for true Triangulation.

So only data from two people is required to Triangulate. You do not need the data from the third person.

When Person a downloads a Chromosome Browser Results (CBR) file from FamilyTreeDNA, it contains all of Person a’s segment matches with everyone else. When Person b downloads their CBR file, it contains all Person b’s segment matches with everyone else. Using these two files, you can therefore find in one fell swoop every segment that is a true Triangulation that involves Person a and Person b and someone else.

This is the method that Double Match Triangulator uses to Triangulate. The method of Double Matching ensures that Person a matches Person c, Person b matches Person c, and Person a matches Person c all on the same segment, which is exactly what the GEDmatch Tier 1 Triangulation Tool does as far as Triangulation goes.

But DMT takes this one step further because of its Double Matching. Only segments that Double Match both Person a and Person b will be included in the Triangulations for those two people. So the triangulations are effectively filtered by the relationship of Person a with Person b. For example, If Person b is a 2nd cousin of Person a, then DMT will produce Triangulations with only people that are not only related to both Person a and the 2nd cousin, but also need to have segment matches between Person a and the 2nd cousin that yield true Triangulations. By comparison, GEDmatch does not differentiate its Triangulations and thus does not give you the ability to filter them.

But always keep in mind that even though both DMT and the GEDmatch Triangulation tool both produce true Triangulations, a true Triangulation does not guarantee that the segment is IBD (see above for the two reasons). Determining IBD is a separate issue that neither DMT nor GEDmatch can yet address.

Comparing Single Matching to Double Matching - Wed, 25 Jan 2017

Important Note:  After this article was written, I found many people had trouble understanding the concepts, as the diagrams were confusing them more than helping them.

This article has been completely rewritten (just two days later) and uses a different diagramming that is akin to looking at the matches in a Chromosome Browser. It should not only be much more easy to understand, but it adds comparisons with the ADSA and GEDMatch Triangulation tools.

I’m leaving this article here as another method of explaining the same thing, but if you haven’t read the other article yet, I’d recommend you read it first:
Triangulation, Single Matching and Double Matching

 

Let’s see if we can define everything in an understandable way.

 

Single Matching

For Person a, find all the Persons c, d, e, … who match or overlap on the same DNA segment.

This is what FamilyTreeDNA and 23andMe give you today. MyHeritage is promising a Chromosome Browser but no word yet on whether you’ll be able to download segment matches. AncestryDNA does not provide you with your segment match data.

image

The goal here is to find the people who get that DNA segment from the same common ancestor, Identical by Descent (IBD), as that will prove a relationship. But this must be checked thoroughly, because one of each pair of chromosomes comes from the mother and one from the father, and the DNA company’s matching process cannot distinguish one from another. So any match, one that even criss-crosses between the mother’s and father’s chromosome will count, as will random matches by chance. With small segments below 15 cM in size, there is a significant likelihood of there being a false match that is not IBD. Even above 15 cM, the segments may still be to the different parent. The main technique to help you identify if the segment is IBD is Triangulation (see below).

 

Double Matching

For Person a and For Person b, find all the Persons c, d, e,… who match both Person a and Person b on the same segment.

This is what my Double Match Triangulator Program will give you today. And this is what you want FamilyTreeDNA, 23andMe, AncestryDNA and MyHeritage to be giving you.

image

Double matching does a lot for you. It uses a second person to help confirm that Person c, Person d and Person e all match each other and are not just matches by chance. It eliminates the extra bits of random match that Person a and Person b may have with the third person. If Person a and Person b are not direct-line related (i.e. parent-child, grandparent-grandchild), then it will reduce the threshold of where false matches will occur, down to maybe even 5 cM as Jim Bartlett has concluded. I plan to do a study of this soon and will put my results in an upcoming blog post.

 

Triangulation

Triangulation is a technique to help conclude (I won’t say “prove”) that three people share a segment that comes from a common ancestor and that the segment is Identical by Descent.

It requires that Person a match Person c on a segment, Person a matches Person d on the same segment and also that Person c matches Person d on the same segment.

image

This statistically reduces to almost zero the possibility of the criss-crossing matches between the two parental chromosomes. It is still possible that one of the 3 people matches by chance to the other two people. But should that chance match be disproved, maybe by multiple Triangulations with other people, then it can be concluded that these people obtained that common segment from a common ancestor.

 

Single Match Triangulation

This is the technique commonly in use today because you are only supplied with Single Match information by FamilyTreeDNA and 23andMe.

That matches between Persons c, d and e are not included in the Single Match data you get. You don’t have this information in your matches. What you need to do is contact either Person c, d or e and ask them to look in the Chromosome browser and see for you if they match the other people on that particular segment. If they do, then you Triangulate on that segment. If they don’t match to some of the others, then you’ll have to contact the others to get them to check.

image

You could have tens of thousands of segments that SIngle Match with others. You may have dozens of people who overlap on a segment. So to be practical, most people just concentrate on their largest size segments, or on a segment connected to people they are trying to figure out their relation to. This is manual labour as far as I’m concerned. And it only verifies one segment for a few people. You have all your others you can do as well that have so much info to give you.

So what people often do is they get lazy. Maybe they verify with one or two people and then incorrectly conclude that all the other matches on the segment are valid. Then maybe they just look to see if the other people are “In Common With” meaning they match somewhere, but not necessarily on the desired segment, and then conclude the segment Triangulates, which is not a conclusion you can make.

Single Match Triangulation is what Jim Bartlett has done over the past five years. He has done it correctly and meticulously. By mapping his segments to his matches, he has manually Match Filtered (I’ll explain what that is in a future blog post) to his parents and has been able to map most of his segments to his ancestors. But it took him 5 years! It’s not easy.

There is one tool that does true Triangulation for you. It is the GEDMatch Tier 1 Triangulation Tool. It is the only online tool that will properly check the third leg of the triangulation for you and guarantee that it is a true Triangulation. All the other tools out there use “In Common With” or less. However, with GEDMatch, you are limited to the kits that have been uploaded there, only your closest 500 matches are used, the minimum cM match is 7 cM and 500 SNPs and it gets cut off at 10,000 Triangulations.

 

Double Match Triangulation

This is the technique I implemented in my Double Match Triangulator (DMT) program that uses Double Matching.

The basis is simple. Once you’ve Double Matched Person a and Person b with other people on a segment, you have all the matches you need except one: the Person a with Person b match. And that you’ve got that right in your own Single Match file.

image

The matches between Person a and Person b could then be compared to all the Double Matches, and those that overlap all Triangulate, and those that don’t are Missing a-b Segments (another word I invented).

With the Chromosome Browser Results (CBR) files of Person a and any Person b that is Person a’s match, you can find every segment that Triangulates and all the people that Triangulate with them on every segment in one fell swoop.

If you can get CBR files from more of your DNA matches and put them all together, you will be doing what I call EAST (Extreme Autosomal Segment Triangulation).

 

Hopefully this post makes the concepts all a bit clearer for you.

Double Match Filtering for an Endogamous Population - Sun, 22 Jan 2017

A few days ago, Roberta Estes posted: Concepts – Segment Size, Legitimate and False Matches where she compared a child’s matches against those of her parents. She downloaded the Chromosome Browser Results (CBR) file from FamilyTreeDNA for a set of parents and a child, and then explained how she did the matching in a spreadsheet.

Roberta’s key result was a Parent Child Phased Segment Match Chart which show she passed the 50% mark for false matches for 7 to 7.99 cM segments rising to 87% false matches once segments are as small as 3 to 3.99 cM.

Roberta refers to this technique as “double parent phasing” (no caps) whereas I’d like to call it “Double Match Filtering” (with caps). My reason for naming it this is because it is exactly the same technique I use for what I call Double Match Analysis.

What is being done is we are taking a child as Person a and one parent as Person b, and we are finding all the Person c people that match to both. Then we do it a second time, with the same child again as Person a, the other parent as Person b, and we then are finding the Person c people that match to them. Using these two sets of Double Matches, we go back to all the child’s Single Matches and see which do not double match to either parent. Those non-matches cannot be Identical by Descent (IBD) since one parent would have had to match to pass the segment down from the ancestor, through them to the child.

The high percentage of false matches for small segments under 8 cM in Roberta’s results is what scares genealogists from using small segments. And this is the downfall of Single Match Triangulation. A large number of small single matches are likely false and are not IBD.

Towards the end of her article, Roberta said:

“I hope that other people in non-endogamous populations will do the same type of double parent phasing and report on their results in the same type of format.  This experiment took about 2 days.

Furthermore, I would love to see this same type of experiment for endogamous families as well.”

An Endogamous Family

I’ve had plans to do this anyway. I need to analyze how the matches pass down as part of my investigation into methods to use Double Match Triangulation to map segments onto ancestors.

So I’m taking a number of Chromosome Browser Results files that were sent to me by Arnold, a DNA-cousin of mine, to help me develop my Double Match Triangulator program and see if I can use it to figure out how we’re related.

(By the way, I define a “DNA-cousin” or “DNA-relative” as someone who is a DNA match, but neither of us have the foggiest idea of how we’re actually related.)

Arnold has been doing DNA analysis with FamilyTreeDNA for a long time, and he had about 20 CBR files that he let me use. He, like me, comes from a endogamous Ashkenazi population.

His files include a father, mother, son and daughter, as well as other relatives of those four. Endogamous population gives those involved many more matches than you’d expect. That’s because everybody is related to everybody else often in multiple ways. Here’s the statistics for the four people I’ll use:

The father has 163,249 single match segments with 7,654 people.
The mother has 149,083 single match segments with 7,139 people.
The daughter has 146,767 single match segments with 7,271 people.
The son has 142,066 single match segments with 7,014 people.

To add an interesting complication, the father and mother are related. They have 25 matching segments that match each other totalling 98.0 cM with the longest being 18.9 cM. This would normally make them something like 3rd cousins. But because of endogamy, they are more likely 5th and 6th cousins in several different ways.

The Spreadsheet Analysis

I basically did what Roberta said to do. I did it twice, once for the son with his parents, and once for the daughter and parents. Each file has about 450,000 lines in it. These are big Excel files that ended up (with analysis equations) being about 80 MB in size each.

I didn’t delete the segments under 3 cM like Roberta did. She was visually inspecting each match herself, so wanted a manageable number of matches to work with. Her non-endogamous CBR files had about 25,000 segment matches in each one, and removing the under 3 cM ones left her with about 6,000 matches in each, for a total of 18,000 lines to work with, and that was plenty to provide reasonable results.

I was able to develop Excel formulas to do the match comparisons that Roberta did by hand. Since I was letting the computer do the work, I didn’t need to cut down the size of the analysis and I could work with the whole dataset.

Roberta didn’t mention it, but you do have to remove the father, mother and child wherever they appear as the “MATCHNAME”. They all match each other on many segments, including the father and mother as I mentioned above. You don’t want to count those in these statistics.

Also, it’s really important is to check the date of your downloads of the two parents and the child file. If they were not downloaded at the same time, a later downloaded file will contain matches to people that an earlier download did not. This will make it look like one person matches and the other does not when what is really true is that you just don’t have the matches for the other person.

These one-sided matches had to be eliminated. I found the best way was to see if the child had matches to a Person c that neither their mother or father had. For this Person c to show up in the child’s match list, they had to have at least a half dozen matches totalling at minimum around 20 cM. For that to happen and for none of those segments to match either parent is practically impossible meaning the matches for the parent is missing. So I deleted these from the analysis. They amounted to about 5% of the matches and did not really change the results other than reducing the number of large segments that did not match.

And because the parents were related, I knew there would be some matches that would be on both parents sides, so I made sure I was able to count those so I’d have them for future analysis.

The Double Match Phasing Results

These results include only matches on the 22 autosomal chromosome pairs. The X chromosome is a bit different so I removed them and will analyze them separately in a later post.

Here’s the results of the daughter versus her father and mother:

image

And the results of her brother (the son) versus the same father and mother were very similar:

image

The results showed that there was much less chance of a non-match in small segments for these endogamous people than what Roberta was showing as her results. Yellowing in the 50% point, it comes in at the 2 – 3 cM range, as compared to Roberta’s 50% point which for her comes in at the 7 – 8 cM range. This surprised me so much that I went back and double and triple checked my equations to make sure they were identifying segments correctly and totalling everything correctly. They were.

Here is a plot of % Non-matches by segment size from several different analysis. In addition to my results and Roberta’s results, I’m including John Walden’s False Positive both sides phased results that are on the ISOGG Wiki which Blaine Bettinger talks about in his “Small Matching Segments – Friend or Foe” article of 2014. Also I’m including Ann Raymont’s findings in her “When is a match a false positive?” post from 2016.

image

It seems that every other study, all non-endogamous populations, give similar results, but mine is different. I currently do not know why this is. I can’t think of a reason why endogamy might give fewer non-matches for a given segment size. Unless my analysis is being done differently (or incorrectly) and I don’t believe it is, and my number of observations used is certainly large enough, then I think I may be showing something quite significant and relevant.

Among my 68 Chromosome Browser Results files that I have and that my DNA-relatives have given me, this father/mother/son/daughter was the only set of both-parents with child that I have. I would like to test some more, both endogamous and not.

I made my analysis spreadsheet quite general so that I could easily do this analysis for any father/mother/child triplet. If you’re interested in seeing what your non-match percentage looks like and would like to help me with this research that I’ll use to give my Double Match Triangulator program some smarts, please send me your set of CBR files. In return, I’ll be happy to send you the spreadsheet with your data in it and the results.

So if you have any set of CBR files from FamilyTreeDNA that include both parents and 1 or more children, would you be willing to send them to me so that I analyze them the same way?  Thanks.