Login to participate
Register   Lost ID/password?

Louis Kessler's Behold Blog

Sex in GEDCOM - Thu, 14 Jan 2016

I have come across a need to check out the SEX tag in GEDCOM. Some of the new DNA features I’m finishing up for the next version of Behold make important use of the sex of the individual. Determining autosomal, X, Y and mitochondrial DNA shares between two individuals is much less accurate when the sex of anyone in the relationship line is not known.

GEDCOM includes sex quite succinctly as a level 1 tag of an individual defined like this:

+1 SEX <SEX_VALUE>   {0:1}


SEX_VALUE :=    { Size=1:7 }
A code that indicates the sex of the individual:
      M = Male
      F = Female
      U = Undetermined from available records and quite sure that it can’t be

A few oddities already. It appears that only, “M”, “F” and “U” are allowed for the SEX_VALUE, and I’ve never noticed a program that doesn’t adhere to this. But if you read carefully, it is not requiring that the value be restricted to these three. It is leaving the door open to other possibilities (what, I can’t guess at). I find it very strange to see Size=1:7 if only one-character codes are allowed. Why not Size=1:1?

Also, it is possible for the SEX tag to be missing, since {0:1} are allowed.

My interest from the DNA perspective is in trying to determine if possible, if the individual is male or female.

So let’s use rule number 1:

1. If the SEX_VALUE is “M”, the individual is assumed to be male.
    If the SEX_VALUE is “F”, the individual is assumed to be female.
    If the SEX_VALUE is anything else, or missing, then the sex is unknown.

If that was all of it, we’d be done. But there’s more.

Children have parents. Genetically, they always have a father and a mother, although that isn’t necessarily so for adoptive parents, foster parents, etc. Again, I’m going to restrict myself to DNA interest and assume that there is one male and one female parent, whether or not the parents are known or unknown.

In a GEDCOM file, each individual points with a FAMC tag to the FAM record that contains the person’s parents. An individual could have more than one FAMC tag and point to multiple FAM records. Only one of those FAM records can be the birth parents. All the other FAM records must each contain at least one non-birth parent.

If a person has multiple sets of parents, then it is important to know which parents are the birth parents. GEDCOM does not give any specific rules for ordering FAMC tags. It does give a rule for ordering CHIL (child tags) and states: “The preferred order of the CHILdren pointers within a FAMily structure is chronological by birth”. You would think then, that a logical extension would be that FAMC tags should also be ordered chronologically, with the birth parents always listed first. Behold already checks the “MARR” date and reorders the FAMCs when the dates are out of order. I don’t believe very many programs enforce FAMC order for their GEDCOM output as I’ve seen incorrectly ordered FAMCs in a good number of the test files I use.

The FAMC tag could have a level 2 PEDI tag under it which contains a PEDIGREE_LINKAGE_TYPE value, which is one of: “adopted”, “birth”, “foster” or “sealing”. If this tag is listed and “birth” is specified, then that FAMC tag should be listed first. Now we have more complications. We have to ensure that at most one FAMC tag for an individual has a PEDI tag with a “birth” value. In practise, I have not seen the PEDI tag used very often.

GEDCOM also allows (just to make a genealogy programmer’s job more difficult) a FAMC tag to be subordinate to an individual’s BIRT (birth), CHR (christening), or ADOP (adoption) tag. Here if a FAMC tag is subordinate to a BIRT tag, then the family should be the first FAMC. I have seen this used occasionally.

Okay. Now we’ve established to the best of our ability, the FAM record of the birth parents. Now we have to determine who the parents are.

I was going to describe the FAM record and HUSB and WIFE tags in much more detail, but I don’t have to because I’ll just point you to an excellent article that Tamura Jones just happened to publish earlier today: Marriage in GEDCOM

Tamura correctly states that the FAM record need not contain the HUSB or the WIFE tags. If not, well, then we just don’t know who that parent is.

My interest for my DNA purpose, however, is to determine each parent’s sex. The HUSB and WIFE tag will point to the parent’s INDI record, and the INDI record may have a SEX tag and we can use rule 1 (above).

But what if rule 1 results in “unknown”. Then should we be able to infer the parent’s sex by which one was associated with the HUSB tag and which one was associated with the WIFE tag? I’m not 100% sure yet. I believe, but I don’t know whether many programs enforce this association when exporting to GEDCOM. My next step will be to add a check into Behold that will see if the HUSB tag is pointing to a female individual, or if the WIFE tag is pointing to a male individual.

I would think the SEX tag of the individual (rule 1) normally should overrule the HUSB/WIFE tag pointing to the individual. So I would add rule 2:

2. if the sex is unknown from rule 1, then
    if only HUSB pointers point to this individual, assume he is male.
    if only WIFE pointers point to this individual, assume she is female.
    if both HUBS and WIFE pointers point to this individual, issue an error.

But if these are the birth parents, they cannot be the same sex. If these two rules result in both birth parents being assigned the same sex, then Behold will provide a message pointing out the conflict and indicate that for this case, it will assume the HUSB/WIFE tags to be correct.

If the individual’s SEX is specified, then rule 2 is not needed and the HUSB and WIFE pointers do not have to be looked at. But what if the HUSB or WIFE tag conflicts with the SEX tag? This is possible in the case of same-sex marriages, and assigning both individuals the same sex likely is a reasonable way of adding same sex marriages to a GEDCOM standard that many have said does not allow it. Two individuals can both be males or both be females. But two HUSB tags or two WIFE tags are not allowed. Therefore for a same-sex marriage the HUSB tag would point to one individual, and the WIFE tag would point to the other.

GEDCOM states:  “The family record structure assumes that the HUSB/father is male and WIFE/mother is female.” Note that it says “assumes”, and does not state “requires”.

So in GEDCOM, a same-sex couple could be represented as:

The family record is no different than normal:

0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@

The INDI records for two males:

0 @I1@ INDI
1 FAMS @F1@

0 @I2@ INDI
1 FAMS @F1@

or for two females:

0 @I1@ INDI
1 FAMS @F1@

0 @I2@ INDI
1 FAMS @F1@

For more information about same-sex couples in GEDCOM, read Tamura Jones’ article: Same-Sex Marriage in GEDCOM

With regards to GEDCOM, I daresay that SEX is neither clean nor easy.

2016 and Looking Forward - Sat, 2 Jan 2016

It’s a new year. I was very happy to complete the tabulation of the GenSoftReviews Users Choice Awards for 2015 and send out the Press Release about it, which so far only James Tanner has posted about it (also here). Usually that takes me a full day to take the results, make up the new badges, and manually rebuild the awards page and send out congratulatory emails to each of the winners. And yesterday, January 1, was no different. My family was home but they knew my annual ritual and I was upstairs in my office working on this and coming down only at meal times.

Now I can get back to what’s necessary. At the current time I am finishing off the 8 talks that I’ll be presenting on the upcoming 10th Unlock the Past Genealogy Cruise which I and my wife are really looking forward to. This will be different from my first cruise with them a few years ago. There are fewer attendees and more days at sea with more talks. It will be more intimate and we all will likely become close friends with everybody by the end. It will be tough to leave once its over. The cruse is from February 14 to March 3. I will be attempting to tweet and even blog about some of the interesting speakers and talks on the cruise while I’m away – but that may be spotty depending on the Internet on the ship and how much I can post from my phone.

While down under, I’ll also be filling in for Judy Russell as she takes a day off in Melbourne, Australia on Friday February 26. I’ll be giving my talk on How to improve research with source-based genealogy. If you can make it to Melbourne that day and would like to join me there for lunch, I’d love to have you come. If you want to hear the talks, you can book a spot for yourself or pay at the door.

We are leaving Winnipeg on February 10, so that precludes me from going to RootsTech this year.  I had gone 2 years ago and 4 years ago so I was starting a pattern, but I’ll have to break it this year and try again next year.

Along with finishing the presentations, I am also attempting to get one more version of Behold out prior to the trip. These are two relatively simple but pretty exciting enhancements that I’m working on. They are each worthy of their own blog post so let’s wait for the release, which I’ll likely call version 1.3, when I’ll describe them.

Tamura Jones today asked me if interest in FHISO is gone. FHISO is the standards organization that was formed to produce a standard to replace the GEDCOM genealogical data communications standard for transferring genealogy data between programs. Tamura wrote in his Genealogy 2015 article:

"There was practically no movement on genealogy standards. Throughout most the year, the Family History Information Standards Organisation (FHISO) seemed asleep again, and when they finally published some board meeting notes in November, the focus of these notes wasn’t on new technical standards, but overwhelmingly on modifying the organisation; possible moving to another state, creating a new vision for FHISO, changing the organisation structure and amending the by-laws."

I responded to Tamura and said that I think interest is still there in what FHISO wants to do, i.e. update/create a new standard. But I think people have given up hope that they’ll ever get it done.

And here it is 2016, and as it turns out people are thinking the same about Behold – that I’ll never get it done. In fact, Behold fell out of the GenSoftReview User’s Choice group last year because of 4 comments that had no particular gripes with Behold, except that they expected it would never get done.

Well, I will be continuing to work hard at it. I don’t make it common knowledge, but I do have a full time day job, and a wife and two girls at home, and family that we spend time with and an elderly uncle I help with and a regional heritage centre that I’m a past president of and still do a lot of volunteer work for, and a city hockey team that I and the rest of my city gets excited about, and I take out the garbage, do the dishes, shop for groceries and partake with the family in city events and go to funerals. The 20 or so hours I spend in front of my computer each week, mostly late evenings and weekends are not all Behold development. There’s everything else that needs to be done on the computer ranging from website maintenance and writing blogs to doing my finances and researching plane tickets and hotels. So 10 hours a week on Behold is about all I average (not counting the hours in the shower, on the bus, and just sitting around thinking about it).

I’m excited about where I’m going with Behold. I’m trying to keep up with technology and incorporate what’s new. There’s Windows 10, the cloud, phones, programming language advancements, database advancements, online genealogy databases to connect to, the DNA phenomena, new genealogy services, old genealogy services shutting down, and constant changes in the genealogy software scene which I keep up with due to my monitoring of all the software out there for GenSoftReviews.

So please be patient. I will get there. I am getting there. Behold and using Behold to record my genealogy and allowing others to do the same is my lifetime goal.

Here’s a little secret, just for you: On November 24 of this year, I will be turning 60. Also, on that day I will be retiring from my day job. I do enjoy my job, but I want to spend more time working on Behold and my genealogy while the time is ripe.

2016 will be an especially exciting year for me. I wish the same be true for all of you. All the best in 2016!

Everything In, Nothing Out - Not Helpful for DNA. - Sat, 26 Dec 2015

Do you an amass everything you can that might be related to your genealogy research? You have books in your library. File cabinets full of articles and clippings. Closets with boxes of unsorted picture albums and scrapbooks with letters and ephemera. Videos still on VHS and super 8.

You’ll get to it someday. I know you will. I know I will.

But when you sort through it all, you’ll be looking for just the materials that are helpful to you. Hopefully, as you do so, you’ll be digitizing everything and only keeping originals of documents and anything sentimental and throwing everything else out.

Again, the stuff you save and digitize and organize (by source) to complement your genealogy data, will be just the material that you will use or potentially use to help you document your family and lead to new discoveries.

Okay. So now you’ve got into DNA. You did a test, or a few tests with one or more companies. You expect your genealogy software to load in this data and process it for you in some useful way. The software should use it to help you document your family and lead to new discoveries, right? … (sound of silence)

I was wondering what features developers were adding into their genealogy programs. I asked a question two days ago on Genealogy and Family History StackExchange Q&A site regarding the DNA Features of Genealogy Software. What I found out is that there are at least 5 programs now that have at least some hooks into DNA and allow you to record your sample results. Check out the interesting answers to my question to learn what programs they are and what they do, and if you know something more, be my guest to add an answer.

What became apparent to me is that the programs currently are simply recording your data. They do not yet seem to be doing anything with it that can really help you.

I learned my lesson maybe 20 years ago. I had a house, a job, a young family and finances were a concern. So every receipt, bill, statement and financial transaction went into a program you all must have heard of:  Quicken.

The box of an early version of Quicken

I spent a few hours every week recording my money transactions into this program. Each year I would pay $60 or so to upgrade it. Then I would add more transactions. In total, several hundreds of hours a year were spent doing this.

And what did I get out of it? A few reports run a couple of times a year that would tell me I was spending as much money as I was making. Hmmm. I knew that already by my bank account balance. So I would do a run that would say I’m spending 15% on food, 10% on clothing, 15% on housing … Hmmm. Didn’t really change every year. Once you know this, you know this.

So why was I spending hundreds of hours a year to enter this data? I gave myself a few slaps across my head and after 8 years of doing this, just stopped cold turkey. I immediately gained a several hundred hours a year.

File folders - 12 monthsWhat did I replace it with? For the 99% of receipts that are meaningless after a year, I use a simple file container with 12 file folders with 12 months marked on them. I’d put in my receipts by month, and I’d have them if I needed them. Upon a new month, e.g. January, throw out the old January receipts and start again. Did I need to total up these receipts or categorize them? No. And maybe a couple of times a year, we’d have to go back to an old receipt, and it wouldn’t take more than a few minutes to find it. Total time: 30 minutes a year.

Now lets talk about DNA again. There is A LOT of data in your DNA results. There are 100 different summaries you can do of it. You can spend all your time entering it, or spend your time looking for programs that will do that for you, and enter your DNA results into your genealogy software. Why?

Unless you will get something useful out of this, you’re wasting your time.

It appears (unless I am wrong and new answers come up to my StackOverflow question) that genealogy software today will not do anything useful with your DNA data. At the moment, it appears, you will not be doing yourself any favours by spending your time loading your genealogy data into your program.

There are already a good number of utility programs and sites online for analyzing your DNA. By all means use those, and input your DNA data into them in the format they require. Just make sure they could provide something useful for you in return.

But as far as your own genealogy desktop program goes, wait. Wait until the genealogy vendors give some superb functionality you absolutely cannot do without that needs your DNA data to be entered. Then, your time to enter your data will be well spent. Until then, don’t waste your time.