Login to participate
  
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Comparing Genetic Clusters - Tue, 22 Jan 2019

In my last post, I described how I obtained my results from three Genetic Clustering tools: 

Pretty pictures are nice to look at. But what we really want is to be able use the results. The goal here is to see if these cluster algorithms actually do segment your family into groups that are related through an ancestor you can identify. And do these tools all identify the same cluster, or different clusters? Do they contradict each other, or can you use them all together to get even better information?

I took the three sets of results and combined them into one spreadsheet. These were all from AncestryDNA.  All three included matches down to 40 cM.

Currently, I have 242 matches of at least 40 cM at AncestryDNA.

Genetic Affairs included 226 matches and it put them into 34 clusters. They excluded my number one match of 411 cM because it exceeded their default upper limit of 400. The other 15 were excluded because a cluster could not be determined for them.

Collins’ Leeds included 223 matches and it put them into 37 clusters. It also excluded my number one match, as well as 18 others that it could not determine clusters for.

Shared Clustering included all 242 matches and it put them into 7 clusters.

Of my 242 matches, I know exactly how I’m related to just 8 of them.  I can divide them up into my 4 grandparents as follows:

  • 3 are Braunstein, my paternal grandfather’s side.
  • 2 are Focsaner, my paternal grandmother’s side.
  • 0 are Girman, my maternal grandfather’s side.
  • 3 are Goretsky, my maternal grandmother’s side.

So lets take these 8 people and see where each clustering technique puts them. Listed in the table is the cluster number assigned to each relative.

image

Genetic Affairs put all three of my Braunstein relatives in cluster 4. The Focsaner and Goretsky relatives were put in different clusters. Collins’ Leeds used different clusters for all of my relatives except for two Braunsteins that it put into its cluster 24. Shared Clustering put all my Braunsteins in its cluster 2, both my Focsaners in its cluster 5 and all my Goretskys in its cluster 1. So far so good.

Now let’s take all the rest of my 242 matches, and if they fall into a Braunstein cluster, I’ll color it blue.  i.e. Genetic Affairs cluster 4, Collin’s Leeds clusters 24 and 3, and Shared Clustering cluster 2 will be colored blue. Similarly Focsaner clusters will be brown and Goretsky clusters green.

My results are in the table below.  To intepret this table:

Look at Relative 9.  It was assigned cluster 4 by Genetic Affairs, cluster 24 by Collin’s Leeds, and cluster 2 by Shared Clustering. From my known relatives, those clusters all correspond to a Braunstein relative and are colored blue. They all match.

Whereas Relative 11 was assigned cluster 9 by Genetic Affairs, which was a Focsaner cluster and is colored brown. The assignment was cluster 31 by Collin’s Leeds and cluster 1 by Shared Clustering. The latter two are Goretsky clusters, so they are colored green. There is disagreement here, so Relative 11 is colored yellow.

(This is the tallest graphic I’ve ever included in my blog)

image

I have 9 relatives (Relative 9 to 17) that all 3 clustering techniques have assigned a grandparent cluster.  Unfortunately, they only all agree on the grandparent in 4 of those 9 cases.

I have 28 relatives (Relative 18 to 45) that two of the techniques have assigned a grandparent cluster. 18 of those assignments agree. 10 do not.

Then there’s 113 relatives (Relative 46 to 158) that one technique has assigned a grandparent cluster. Since there’s only one, there is no telling if the others would agree or disagree.

Leftover and not shown are 84 relatives where none of the techniques assigned a grandparent cluster.


Conclusion

The goal here is to be able to assign a grandparent to my matches whose relationship I do not know. Using the 8 relatives I do know, and assigning their grandparents to the clusters they were assigned, I can get 1 to 3 cluster assignments for 150 of matches.

Unfortunately, only 4 out 9 (44%) grandparent assignments agree for me when all three techniques have assignments, and 18 out of 28 (64%) agree for me when two of the techniques have assignments. That’s a bit more disagreement than I was hoping I’d get from different genetic clustering techniques.

I do have a lot of endogamy in my ancestry. I would expect that people who have more distinct lines than me to get more agreement between the clustering techniques than I have.

1 Comment           comments Leave a Comment

1. paulbaltzer (paulbaltzer)
United States flag
Joined: Tue, 11 Jun 2019
5 blog comments, 0 forum posts
Posted: Tue, 11 Jun 2019  Permalink

Great analysis of the effectiveness of 3 clustering tools! Thanks.

 

The Following 1 Site Has Linked Here

  1. Clustering Tools for DNA matches | DNAsleuth : Fri, 1 Feb 2019
    [...] Kessler wrote two detailed blog posts about genetic clustering: Genetic Clusters and DNAgedcom and Comparing Genetic Clusters. Developers also provide [...]

Leave a Comment

You must login to comment.

Login to participate
  
Register   Lost ID/password?