Login to participate
  
Register   Lost ID/password?

Louis Kessler's Behold Blog

Writing Freeware (Double Match Triangulator) - Sun, 17 Jul 2016

Most people might think releasing a freeware program is easy. Just write it and make it available. Right?

Well, there’s a bit more in it than that.

When I came up with the idea for Double Match Triangulation of autosomal DNA using the chromosome match files produced by FamilyTreeDNA, I knew I’d need a program to sort all that data out. And when I went online to see what there was, and there was nothing like it, I knew I’d have to create it and make it available so that others can use it too.

I first figured out what was needed by doing the matching with Excel. I loaded two chromosome match files into Excel, merged them together, and developed equations to determine segment overlaps. I then used conditional formatting to color the cells to make interpretation easier.

Once that template was set up, it wasn’t too much work to build a program with an engine that would read in two chromosome match files, compare them the same way I was doing in the Excel spreadsheet, and output the results to a csv (comma delimited) file so that Excel could read it in and display it all nicely.

 

So at that point, just a few little things to do:

1. Blog about the technique.

2. Get a few sample files from people so I can test it.

3. Test it, and find problems with the input files and handle them.

4. Learn from the results, and figure out more that can be done.

5. Decide what will be in the first cut of the program.

 

Basically the program is done…. Except it’s not.

6. Mock up a user interface to allow selection of files.

image

7. Include Open File dialogs to select the files

8. Include Open Folder dialogs to select the folders. Wait, there aren’t any Open Folder dialogs available in the Visual Controls Library. I have to research my options, see what I did in GEDCOM File Finder, and decide how to implement this.

9. Save past files and directories to the Registry so that they can appear in the  recently used list. (You’d hate me and I’d hate myself if I didn’t do this.)

10. Add error checking of file names and input files.

11. Figure out what to put in the status box and log files to track what was done and what wasn’t and any errors encounters.

12. Realize it’s easy to export to csv, but a pain to manually format it once you load the csv file into Excel. So I look for a way to automate the loading of the Excel file directly.

13. Try to make sense of the Office Developer Documentation and find the commands needed amongst the millions of articles.

14. Spend a week implementing the automation, and once it is working, realize it takes 10 times longer than creating the csv file.

15. Puzzle about ways to improve this slowness while in the shower, on my bike and at 3 in the morning.

16. Try various things, and find that creating a temporary csv file and then automating its input is 5 times faster than direct to Excel automation.

17. Rewrite everything so that multiple files can be matched at once.

18. Make sure it all looks nice, still works, and does what’s needed.

 

All done now? Yup. Except left to do:

19. An installation script for it.

20. Webpage for it so there’s someplace to download it from.

21. Some documentation would be nice.

22. Blog posts, announcements

 

Yay! Finally done.  … but forever followed by:

23. Support, bug fixes, response to questions, enhancements

 

So that’s how a freeware program is made. And the timeframe is after work, in the evenings and on weekends when not on errands, when your family lets you be alone, and when you’re not too tired to think.

Hopefully the Double Match Triangulator program will be available in the next week or two for anyone to try out.

FamilyTreeDNA Matches both Chromosome Halves - Thu, 7 Jul 2016

Be careful if you’re triangulating at FamilyTreeDNA. I just found out they can match twice on a segment.

If you look in your Chromosome match file which is downloadable from the Chromosome Browser page, you may find matches with a second person that overlap. For instance, look at this match my uncle has with David:

clip_image002

On chromosome 5, 7 and 12, there are three matches that overlap. The matches on chromosome 7 are identical. This means that one half of my uncle’s chromosome matches with one half of David’s chromosome and the other half of my uncle’s chromosome matches with the other half of David’s chromosome.

You don’t notice this when you use the chromosome browser. It will show just one of the matches:

SNAGHTML245e42a2

This doesn’t happen often. There are only 198 overlapping matches out of the 178,955 matches in my uncle’s file. But that’s often enough that I will have to make sure my Double Match Triangulator program recognizes and handles this case. I now have to figure out how to best display this information, and either singly or doubly match the overlapping half segments. Unfortunately, I don’t think there is a way beforehand to know which of the two matches of people a and b corresponds to the match of people a and c. But I’ll investigate.

By the way, the match of my uncle with David is reported on the Chromosome Browser as having 26 shared segments totalling 87.36 cM. On the Family Finder it is reported as 107.87 cM, and in the chromosome match file I downloaded, after including the 3 overlapping segments shown above and 2 others, there are 59 matching segments totalling 203.56 cM. So it gets me to wonder which one is correct, if any. Nevertheless, the detail of the chromosome match file does give me some confidence that it might be right. And I suspect that the dual half segments and maybe some of the smallest segments are left out of the cM calculations.

And, if you hadn’t noticed, FamilyTreeDNA made some major changes today and updated their Family Finder interface. They now phase your relatives and show which matches are on your fathers side, mothers side, or both sides. Of course you need more than just one person tested and some known relationships entered before a paternal and maternal side can be assigned:

image

I also notice they changed the ordering. Your matches are now ordered first by relationship range and then by shared centimorgans. It used to be ordered first by relationship and then by largest segment. As a result, all the matches changed order significantly. But it seems that the relationships and cM values did not change.

For more information about this set of FamilyTreeDNA changes, see Roberta Estes’ post: Family Tree DNA Introduces Phased Family Finder Matches


Followup:  July 23, 2016

Scratch most of the above. I now believe what I first said was wrong.

After further analysis, I have changed my thinking. The overlapping matches in the Chromosome match files are not separate matches by FamilyTreeDNA on the two halves of the genome. I don’t think they do that. Any overlap would look like just one match over the both genomes.

What most (if not all) of those overlapping segments are from are from the incorrect way Family Tree Maker is listing people in the chromosome match file. They are being merged by person name and then by chromosome number and then by location on the chromosome. If two people have identical names, their information is being put together as one in the chromosome match file. This is incorrect and needs to be fixed by FamilyTreeDNA. What they need to do is incorporate the kit number into the matching, so that three John Smith’s are not put together.

See also my recent post: FamilyTreeDNA’s Chromosome Match File for more problems with the file that FamilyTreeDNA needs to fix.

Help Needed for DMT - Thank You! - EAST Part 3 - Sat, 2 Jul 2016

I have the basics of my Double Match Triangulation program working, but before I can release it to the world (as freeware!), I must put it through its paces and test it with some real data and ensure that it will correctly analyze and display the data and relationships.

Since I’ve only DNA tested my 93 year old uncle Harry, and since two people’s Chromosome match files are needed for the program to work on, I cannot do this by myself. So I contacted several of the people listed as matches on my uncle’s Family Finder page at FamilyTreeDNA to see if they would help out with my research.

I was overwhelmed by the enthusiastic response. Everyone, myself included, is looking to find some way to make some sense out of their autosomal matches, and then there’s the potential promise that true triangulation made easy by my DMT program could save loads of time and help us figure out how some of our matches are related.

My uncle’s match list (which is growing daily as FamilyTreeDNA finds new matches) is currently up to 7,865 matches and still only has one confirmed relative.

The one confirmed relative is Joel, who is my 3rd cousin, and my uncle’s 2nd cousin once removed on my uncle (father’s brother’s) mother’s father’s side. Joel is 3rd out of 7,865 on my uncle’s match list with 134.8 cM shared. Joel and I have been communicating for years working with several other cousins on that common side of our families. Joel sent me his chromosome match file.

Then I found Seth, whose ancestral surname was Braunstein (the same as my uncle), whose family originated in a town in Romania less than 100 km from where my uncle’s Braunstein ancestors came from. He didn’t show up prominently in my uncle’s matches until FamilyTreeDNA’s recent algorithm update. Seth moved up from a 5th to remote cousin sharing 127.1 cM to a 2nd to 3rd cousin sharing 130.7 cM. I’m very hopeful we’ll find the connection between Seth and my uncle because we know it will be on both our paternal lines. Seth sent me his chromosome match file.

Another person high up on my uncle’s match list was Erika, listed as a 2nd to 3rd cousin at 160.0 cM. She caught my attention when I was putting all my Pikholz connections together in preparation for my day at the Ontario Genealogical Society Conference where Israel Pickholtz was going to speak. About half the people in Israel’s book: Endogamy, One Family, One People were on my Uncle’s match list. But Erika was the closest of anyone with a Pickholtz connection. I contacted George, Erika’s cousin who was administering her account and is himself listed as a 3rd to 5th cousin at 92.2 cM shared. George sent me both his and Erika’s chromosome match files.

Then there’s a FamilyTreeDNA project for an area of Ukraine that i joined on behalf of my Uncle. Four people from there, Sandy, Barbara (2nd-4th cousin, 102.5 cM), Bruce (2nd – 4th cousin, 97.0 cM) and Mark who have connections to my uncle, sent me their chromosome match files. Barbara and Bruce also each sent me two more of the files they administer. Sandy sent me 23 files in total covering quite a few relatives in her family, many of which are among my uncle’s matches. Sandy has considerable experience in triangulation and has given talks on her analysis using it. I look forward to working with Sandy to help figure out her/our families.

Last but not least is Arnold Chamove who has been a Behold user for almost a year. He and I have had many good talks since then about Behold and what it does and should do. So it was a bit surprising when I found 6 people whose DNA Arnold administers listed in my Uncle’s match lists, the closest of whom is his cousin Roger (2nd – 3rd cousin, 144.2 cM). Arnold has given me access to 23 of the chromosome match files that he administers. It will be fun helping Arnold put his families together and finding out what our connection is.

It is very interesting that I can’t offhand connect yet to any of these 2nd to 4th cousins except for Joel. Most Ashkenazi lines only go back about 5 generations, and due to endogamy, 2nd to 4th cousins can mean 3rd to 6th cousins, even though FamilyTreeDNA says they try to correct for this.

I’ll be taking these 58 Chromosome Match files and use them for testing and to determine how best to analyze, interpret and present the triangulation data.

Most of the chromosome match files are from full Ashkenazi heritage with all its endogamy. These files range from 8 MB to 14 MB in size and the largest have more than 200,000 chromosome segment matches to 8,000 people. Non-Jewish chromosome files I’ve been sent seem to be about one tenth that size.

9 x 8 = 72 combinations both waysAnd the DMT program does Double Match Triangulation, meaning it needs two match files for a comparison. I will do every pair of comparisons. That will be 58 times 57 or 3,306 comparisons both ways. The program takes about a second per comparison (comparing two files of 200,000 lines each), so once I get the automated selector working, I’ll let it run for about an hour to do them all.

There was one person I asked who would not give me his chromosome match file. It wasn’t that he wanted to keep his information private. Au contraire, Meir is a world expert at Y-DNA research, specializing in the Levite line, and he receives hundreds of DNA files from people willing to help. I know he’d be more than willing to help me.

But Meir’s reason was very interesting. He said to me:

The “autosomal soup” is not science. far from it.
It is pseudo science on verge of charlatanism.
Leave me out of this fiasco.

I pressed him further on this, and he told me he’d do an exception for me if I could meet a challenge. Putting aside the known relations, If I could show him how a mere 7 unknowns out of the 7,600 are related to Harry, he’d be willing to participate. So sort of like Sodom and Gomorrah which needed 10, I’ve got to find 7 good people who I can match to. 

I told Meir this is a fair offer. I said I don’t know if I will succeed in identifying 7 relationship paths just using the triangulation information, but I shall try. The rewards of succeeding are just too great to ignore.

It’s going to be fun!

Obviously, 3,306 pairs of test files is enough for me for now. But if you check your FamilyTreeDNA matches and notice that Harry Braunstein is listed as one of your matches, contact me, and I’ll try to include you in my tests. 

Extreme Autosomal Segment Triangulation (EAST) - Part 1
EAST Part 2 - Double Match Triangulation