A New Notation for DNA Relationships?? - Sat, 19 Dec 2015
One thing missing that I am adding to the upcoming version 1.2 release of Behold is an indication of relationships between people. If there is one thing not easy to determine, it is how people are related to the main person (also called the proband).
But just as importantly, once you know the relationship, there is much valuable information that can be reported that can aid in DNA research. To do so, there needs to be a concise notation for showing the relationship of one individual to another.
I am presenting my proposal for this notation here with the hope that people who are more expert at genealogical DNA research than I can comment and critique and that I can finalize a system that will be simple and will work.
Here is the basis of what I’d like to notate:
We have a person of interest in your family tree who has some sort of relationship to the proband (who is usually you, your spouse, or some relative). We want to designate the connection through male and female lines using:
- X for a female biologically related
- Y for a male biologically related
- ? for a person whose gender is unknown but biologically related
- - (that’s a hyphen), for a person who is not biologically related
Note that I am using X and Y for female and male which are the universally recognized two sex-determining chromosomes. This is better than using abbreviations for female and male (F, M) or mother, father (M, F) which is English-based and which also can lead to dumb mistakes if the incorrect interpretation is used.
So this is how I propose this notation will be written: You start with the person of interest, work up to the common ancestor (if there is one), and then back down to the proband, selecting the character to represent each person along the path.
Here’s a few examples:
My great-grandfather to me (on my mother’s side): YYXY
The first Y is my great-grandfather, the second Y is his son (my grandfather) which could be an X if this was my other great-grandfather on my mother’s side, the third character is an X for my mother and the Y at the end is me.
My great-granddaughter to me (via my daughter and grandson): XYXY
The first X is my-great-granddaughter, the Y following is her father, my grandson, then the X is my daughter, and the last Y is me.
Interestingly, the first example works down from my great-grandfather to me, but the second works up from my great-granddaughter to me. The direction doesn’t matter. The notation will always denote the path from the first person to the second.
Let’s get more complicated and include relationships that have a common ancestor:
My first cousin once removed to me: XYXYXY
Well, there’s many different ways a person can be a first cousin once removed (1c1r) to me. I’m picking just one of these possibilities with the person being the daughter of my first cousin. So this designates that my 1c1r is female, her father is my first cousin, and his mother’s father is my grandfather. And the connection is on my mother’s side.
Here, the path actually goes up to the common ancestor, and then back down to the proband. In fact, there are really two common ancestors for this line, the other one being my grandmother, and that line would be: XYXXXY with the “Y” in the fourth position being replaced by an “X”.
Why do we need this? Well, from the series of letters, the DNA-based relationship of the two people can be calculated. The first two examples YYXY and YXYX take 3 steps to go from the first person to the last. Each step is a sharing of 50% the autosomal DNA. That means the first and fourth people should share 50% x 50% x 50% = 12.5% of their autosomal DNA. The XYXYXY in example 3 has six steps from the first to the last. They should share 3.125% of their DNA.
That 3.125% is for the male common ancestor. If his wife/partner is also a common ancestor, then her connection adds another 3.125% and you get the total autosomal share of 6.25% for a first cousin once removed, which is what all the tables say as shown in the graphic below from DNA-explained.com:
The designation of the sex along the way is also important. All Y’s from the person of interest to your common ancestor indicate a male-line connection and you’ve found a person who could very well be a Y-DNA candidate for your common ancestor. All X’s may indicate a Mitochondrial DNA candidate for your common ancestor. Also, the exact specification of the X’s and Y’s along the way can be used to determine the percentage share of your X chromosome. Using this information, I’ll be able to get Behold to display these percentages.
The two other characters in the notation are also important. If you don’t know the sex of one person along the way, then use a question mark as their placeholder. By doing so, the length of the line is still correct and the DNA relationship percentages can still be calculated, e.g. If your 1c1r’s grandparent was Terry, but you don’t know if Terry was male or female, then you should write: XY?XXY.
The other character is a hyphen which should be used to designate a person who breaks the biological line. For example, in your genealogy you may have a cousin who was adopted. You still consider them a full cousin, and you want them documented in your family tree. But they are not of use to you in your DNA research. So the hyphen is inserted for people who break the biological connection, e.g. in this case, the parent of your cousin. Then this example would be written like this: Y-XXY.
I think this gives a lot of information in a concise easy to understand notation. I have been looking, but I have not been able to find any similar notations that have been formalized. Maybe there is something already out there that I’ve missed. If so, could you please tell me about it.
I would really appreciate your comments, ideas and suggestions and I’ll then be able to finalize this possibly new notation.
Refinement: Dec 20:
The simple notation above does not indicate the character representing the common ancestor. Often that person needs to be known, e.g. to see if there is an all-male or all-female connection to the common ancestor. I like the method suggested by Rob Hoare in the comments below to use parenthesis to surround this person. Using this, example 3 above would now be: XYX(Y)XY.
The nice thing about this extension is that, since there are always two common ancestors, a father and mother, they can both be designated together if desired, as in: XYX(YX)XY
Then in Behold, I could succinctly show the common ancestors together, e.g.:
Jane Person
Relationship: 1c1r of John Proband via Fred and Wilma Ancestor
Line: XYX(YX)XY, Shared DNA: 6.25%at, 50%X
where 6.25% is the Autosomal and 50% is the X-chromosome shared percentages between Jane Person and John Proband through this connection.
If a person is related multiple ways through different common ancestors, each relationship can easily be shown on its own line with its own DNA contributions. The DNA contributions are additive, so the total shared DNA can then be shown.
The parenthesis designation can also be used usefully to denote the direction in a direct line. The first two examples then become:
My great-grandfather to me (on my mother’s side): (Y)YXY
My great-granddaughter to me (via my daughter and grandson): XYX(Y)
Update: On May 23, 2016, I finalized and formalized the notation that I talk about in this post.