Login to participate
Register   Lost ID/password?
The Behold User Forum » Topic           prev Prev   Next next

Source Driven Genealogy and Data - Categorized in: Feature SuggestionsFeature Suggestions

3 posts. Started 9 Feb 2013 by serenity2006. Latest reply 22 Mar 2014 by klemens. RSS 2.0 feed for this topic RSS
1. serenity2006 (serenity2006)
United States flag
Joined: Sat, 9 Feb 2013
0 blog comments, 2 forum posts
Posted: Sat, 9 Feb 2013 Permalink

I looked through the FUTURE plans list of ideas for BEHOLD software and the description of your approach. I like the Everything Report. You struck a cord with me in the issue encounted with multiple data entry screens in software applications and the need to click through about 5 screens to do any task. I saw that SOURCES was on your "sometime" list and it hit on one of my biggest rants with genealogy software. Here is my take on this feature.

Frankly, while the genealogy world espouses the importance of sources, the genealogy software industry does a HORRIBLE job of handling sources and the associated media.

Can you imagine what it would be like if genealogy software treated the PEOPLE facts the same way they treated SOURCES?. It would look something like this:
- All information on a person is limited to 3 data-entry fields Person, Person Detail, Person Note.
- While the program MAY allow you to choose templates for the different types of people, or free-form type the information, the report output and search capabilities of the software still limited you to three fields.
- A new person entry is required for each location the person lived during their lifetime
- A new person entry is required for each name or name variation a person has over their lifetime
- The software has no capability to search, group, and cross-reference those separate person entries

This is how sources, repositories, and media are treated by genealogy software and it is apparent that little time or thought has gone into creating a truly dynamic and user friendly approach to these areas. Again, this is one of my biggest rants with the genealogy software industry. Below is a description of how I have approached genealogy research and it may give you a better idea of the workflow and the difficulties encountered.

My Source-Driven Workflow:

I am doing a one-place study for a town's 150 year celebration in 2018. This is a completely different approach to genealogy research than the traditional one-name or one-family research approach - which seems to be the way most genealogy programs are designed. With a one-place study, you start from the source documents and build the list of individuals based on known source documents (census records, county birth, marriage, and death records, local newspaper obituaries, local cemetery records, etc.). In my current project, I have 5 repositories, about 500 source documents, about 5000 media files, about 168,0000 citations, each citation is linked to numerous fact tags..Every single fact has a source (including a person's gender) because it all starts with the source document. Each source document has an image file.

My approach to this project was to start with the federal census records for the town. For my current project, the 1870 census for the town has 8 pages with 277 individuals grouped into 60 households (one person appears twice in this census record so the census shows a total of 278 individuals). The census for 1870 has a minumum of 5 data items in the title of the page and then another 25 data items in the census page for each individual. Organized into a spreadsheet it works out to about 30 columns and with 50 rows of data per census page (or about 1500 pieces of data). Each piece of data is a potential clue to another source document, so the column with the check box for "attended school within the census year" could lead to a school yearbook, class photo, report card, or even the school registration records with the parents names. The "rent or own" column could lead to land records or farm schedule records. Census records are a fantastic starting point when doing a one-place study and provide all sorts of clues into the community and potential other source records. I have not found a genealogy program that will easily capture every field from a census record, most only capture the title, location, and date and then leave it to the user to enter all of the data into a notes field. How in the heck can you sort, filter or search the notes fields from all your sources?

I am still using Excel spreadsheets for my one-place study. The spreadsheet allows me to easily sort and filter data in a variety of ways as I began examining each individual person and researching beyond the town records to fill in the life events. For the one-place study, I created ONE Excel workbook with multiple tabs.
- Master person index - given name, surname, gender, birth year, death year, and a column for each source document - this allows me to quickly spot where I have holes in my research
- Master sources index - source name, year, publisher, etc, and then a column for each repository where I found that source document with the call-number entered into the field
- Master repository index - repository name and address columns, and columns to capture hours, cost of making copies, whether or not they allow computers or portable scanning of records, and a notes field for misc notes
- Tab for Each SOURCE - (i.e. 1870 US census, 1880 census, etc). For example, on the 1870 US census tab, there are columns for each question on the 1870 US census, with a few additional columns to indicate the state, county, enumeration district, and page. It took me ONE weekend to transcribe all the census records (7 federal censuses,and 6 state censuses) for the town into Excel spreadsheets and capture EVERY piece of data. I also have tabs for birth records, marriage records, death records, and cemetery records.

The Search for Genealogy Software:

After over 35 years of working on research, it is time to start putting it all together into a report format that can be published in a 150 year book format, create group records to link families (currently in family group sheets in multiple binders) and link the media files (in a portable drive with 1000s of document images, photos, maps, stories, etc.). So my search for genealogy software began.

I have tried about 20 different genealogy software products so far and am always looking at new ones, but still can't find one that is friendly to the source driven or one-place study approach. It took over 1000 hours of data entry time to input the data from ONE census record into the genealogy software program and then when I exported to GEDCOM realized that there was no way to view or sort the data into a spreadsheet view. This made it very difficult to begin adding the next census record into the tree and with each additional source document. What was originally a 2mb Excel spreadsheet file became a 68mb file in a 250mb software application BEFORE I started adding all the media links. Then the software began bogging down, locking up my computer and corrupting my data.

On to another software product and I got to start the entire date entry exercise again. Another 1000 hours of data entry time because the software requires multiple screens to input data, multiple source entries for each repository, multiple image links, I am surprised that since genealogy software is basically a database comprised of tables that only a few genealogy programs offer a spreadsheet view of data and none that I have found give you the option to do data-entry or editing from a spreadsheet or table view. Only ONE allowed me the option of importing a csv file, but only for people fields and not for sources.

Sources, Repositories, and Media:

Page 8 from 1870 US Federal Census Population Schedule for Willow Township, Cherokee County, Iowa is the same darn source record whether you find it on Ancestry.com, NARA, Family Search, or the state archives, or your own personal filing system. Only ONE original document exists and I shouldn't need to create a separate source citation in my genealogy software program for every repository that has a copy of the document. The REPOSITORY is what varies, each with it's own call number. I should be able to create a source document that is based on the DOCUMENT and enter ALL of the data from that document into a single screen. Then associate that DOCUMENT with the REPOSITORY and the call number and date accessed for that repository. Perhaps add a note field to that repository so that you can indicate if the copy they have has been altered or is just a really poor quality. Thus the citation for the document would be the same across all repositories. ,Can you imagine if you had to know the exact URL for a website page before you could search for a record online? Imagine if Family Search required you to know the Ancestral File number for the person you were researching and wouldn't let you search for a person by name. Then imagine going back to that record a few years later only to discover that they have changed the way they index the file numbers. No wonder people have such a hard time documenting sources in genealogy software or end up with a separate copy of the same source image for each citation because the online database has created a separate url for each person who appears on the page of the census.

Here is what I would like to see in genealogy software:

1. Organize Sources using Keywords/Groups - (i.e. Source Year, Source Title, Source Place, Source Type) - would require allowing user to create keyword list or flags for each source. Result - user could sort, filter, or search sources in a number of ways or select or exclude sources from GEDCOMS or Reports

2. DATA ENTRY TEMPLATEs for sources.

Third-Party User, or pre-loaded templates for major sources such as census records would be a HUGE plus. This would allow users to come up with templates for specific source documents enountered making the software more dynamic without being dependent on upgrades. It would require creating the ability to add data-entry templates as an add-in feature. You could establish the format required for user contributed templates so that they are always compatible with the software and even set up an "Approved Template" system where users can submit templates through the forum, get feedback, make changes, and receive your final approval before being added to the library of templates. Being able to upload a csv file into a template would be another HUGE timesaver so that even when the software isn't able to handle the spreadsheet view it can at least import the data from one.

An example of a template for a US Federal Census record would be:
SOURCE - would contain the Source level fields (Source Title, link to media items, link to repositories, link to Places)
Action to be done by template - Create Event for each individual
Columns would be created in template by selecting TAGS to match field where data is to be entered for each individual with option to add any conflicting data as an alternate fact.
Rows in template would be a linked to an existing person in tree or add as new person and create the separate citation for that source.(ideally, for data entry speed, allow user to key in all the data first, then go back and link each row to an existing person in the tree or choose to have the system create as new person.
GO or similar button would tell the system that you have completed the data entry or edits and want it to execute the tasks.

Once you create a master template for say the 1870 US Federal Census, then it would be available to select from the library of templates, fill in the fields to specify the place and page number for the individual page of the census being transcribed, then open the spreadsheet and transcribe the census page, then use a lookup field column to either link to an existing person or add as new person. Once a page of the census has been completed, click the GO button and all the indiviuduals now have an EVENT for the census, linked to the source, with an individual citation to that source. Any new people found in the census page have been added as new individuals with a source record. Then it is an easy task of going into the individual's record and creating the relationship links between family members.

3. Associate the SOURCE table with the PLACE table - same place table used for People/Events so that when pulling a PLACE report, the user would also see source documents associated with the place

4. Associate REPOSITORY to SOURCE (many-to-many relationship) - the junction table would have the call-number, date accessed, web/media link and could be entered from either the SOURCE or the REPOSITORY view. This would mean that when you find a new repository for a source document (i.e. a census record), it would only require the addition of ONE repository level citation associated with ONE source, not 50 separate citations for each person who appears on the census page and citation links to each fact on that census page.

5. Associate a REPOSITORY to a PLACE - so that a place report would also have a list of repositories for that PLACE

Like I said in the beginning, I like the fact that you are moving away from the multiple screens to do a single task. As you reach the point of working on SOURCES, REPOSITORIES, and MEDIA, I hope that the long-winded rant above gives you some ideas.

2. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
237 blog comments, 226 forum posts
Posted: Sun, 24 Feb 2013 Permalink


Thank you for excellent ideas. Your points are worthy of more than an answer here at the forum, so I've made a blog post with my reaction to your one-place requirements.

In the meantime, I do know of one program that is used by one-name study people that may be something closer to what you need. It's called Custodian. Try it out and see if it's better.


3. klemens (klemens)
Germany flag
Joined: Thu, 5 Sep 2013
9 blog comments, 4 forum posts
Posted: Sat, 22 Mar 2014 Permalink

Source-based data entry is definitely the only way to go!
Custodian looks very powerful, it may be too much for a smaller family history research. I'll install the demo.

For power users, the traditional view of a big screen with many fields (and templates!) might be the best.
But I've been thinking that a beginner-friendly way of entering data could be a kind of questionnaire. You get one question at a time and are guided to give all the information. There are multiple choice selections and text fields for the answers. If you don't have some info yet, that's OK, then choose the "no info yet" option. The questions that are asked partly depend on the answers that were given before. So only relevant questions are shown to the user.
Maybe some questions could be grouped, so it's less clicking. But don't overwhelm beginners with all of the options at once.

What would a Q&A session look like? Maybe like this:

First tell me about the source.
(If you worked with this source already earlier, then choose from this list...)
Have you got something physical in front of you?
Have you got a digital file? If yes, then let's link to it.
...Ah, it's a jpg. Do you want to transcribe the content of this file?
Is it a scan or a photo? (the questions would be different if you linked an audio file or a movie or a text document)
Where did you get the file?
Have you got information about the author? How much do you trust this source?

Now, what's the type of the source?
(again, depending on previous answers)
Is it a government document?
Is it a church document?
Is it a personal document?
Is it a family photograph?
Who is the author / who issued the document?
When was it produced?

Then, about the content:
What does the source describe?
Is there an event? Or more than one?
Is it just a statement about something?
(following questions change according to type of event)
What names are given? What are the roles of these people?
What places are mentioned?

The user should be asked to give the bits of information that he believes are relevant, but he should be encouraged to add the rest of the information, too. After all, what looks like an irrelevant detail now, may be of significance later.

After all the bits of info are entered, the next step should be to look up possible connections and show them to the user. At that time, the user may rate the information or possible links. The chain of reasoning needs to be recorded here.

But, Louis, please don't think too much about these things now. First things first, looking forward to life events ;)

Leave your Reply

You must login to post your reply.

Login to participate
Register   Lost ID/password?