There’s no way to rule out endogamy, as you say. But the point of my post is, attributing these issues to anything other than statistics is known as the “regression fallacy”. Yes, other factors may be contributing to regression toward the mean, but this effect will happen regardless. Everywhere I see discussion of endogamy, incomplete information, etc, but regression to the mean could play as big a role as these other factors and I never see it discussed.

]]>You said: “for the handful of weak matches for whom we have traced a MRCA, the generational difference has usually been larger than had been predicted by the model”. I would expect that is because the one MRCA you identified is one of many MRCA’s that you and your match have. Neither of you has done enough research on your other lines to connect them to anyone else than the one MRCA you found. I am always aware of this being from an endogamous population.

And that one MRCA you found may be more distant than the other’s you have not found. In the end, the sum of the segments contributed by all the MRCA plus by-chance segments will bias the total cM to be higher than what you would expect for the one MRCA you found.

Louis

]]>When you look at list of matches, each has some DNA in common with you and each has some distance to a MRCA. The goal is to guess the latter based on the former. When correlation between two variables is strong the regression effect is minimal and the guesses should be pretty accurate. But lets ignore the matches with MRCA of, say, 4 or fewer generations back. The rest of the data is usually a huge list of weak matches, and the correlation between amount of DNA shared and generations to MRCA should be much weaker. Now when you sort this list of matches by shared DNA (still ignoring the strong matches) and just scroll through the beginning of that list, as most people probably do in practice, you are singling out among all of your weak matches those who have more shared DNA than most. And typically, the explanation for why these people would share more DNA than most of your weak matches is “luck”, rather than having a very close MRCA. What you should observe then is that the MRCAs with a lot of these “closer weak matches” is farther back than would be predicted by the regression model. I have found this anecdotally myself: for the handful of weak matches for whom we have traced a MRCA, the generational difference has usually been larger than had been predicted by the model. I am interested to know if other people have the same experience, since its impossible to tell here if the assumptions are in place for regression to the mean, or if so, how strong the effect should be.

]]>Are you the victim of April 1st? :-)

Joe

]]>