Login to participate
  
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Standardizing Sources and Citation Templates - Wed, 27 Aug 2014

Note: This article has been submitted to FHISO’s Call For Papers as CFPS 114.

Thank you to the people who provided comments and feedback that helped me finalize this paper: Tamura Jones, Enno Borgsteede, Tony Proctor, Randy Seaver, Richard Smith, and Tom Wetmore.

 

Contents

Abstract

Introduction

Definition of Citation

1 GEDCOM’s Current Source Definition

2 The Underutilization of GEDCOM’s Sourcing

3 Zoteroing In On A Solution

4 Don’t Mix Up Data with Formatting

5 Suggesting a Solution

6 What about Repositories?

7 The Case for Citation Templates

8 Separating Sources from Conclusions

Summary and Recommendation

Works Cited

This paper was originally published August 27, 2014 on Louis Kessler’s Behold Blog.[1]

 

Abstract

A new standard is being explored to replace GEDCOM to provide an improved means of data communication between genealogy software, online family trees, and repositories of research information. One of the main concerns about the current GEDCOM standard is that source documentation and citations do not currently transfer well between systems. Addressing this issue is a requirement for the new standard. This paper will describe a simple method to define sources for the new standard and explain why citation templates should not be part of the standard but be defined separately.

 

Introduction

One of the most talked about issues regarding the GEDCOM transfer of data between genealogy software is how poorly source information moves between programs. (Thorud, Composer, & Hatchett, 2012)

During the past 15 years of developing my genealogy software Behold,[2] I have had to travel deep into the guts of the GEDCOM standard and discover how it works and interpret its workings.

Part of this task was to be able to logically present all the source information that is contained in GEDCOM files to the user. I determined the logical structure for source data that was used, and I found the capabilities and limitations in the GEDCOM definition that both would enable complete transfer of this data between programs, while at the same time limit it.

I discovered the ways some genealogy software exported their sources incorrectly to GEDCOM and how other programs attempted to export their formatting templates with their source data to a GEDCOM file. I read many articles about how few programs could properly read the sources that another program had exported.[3]

I also have thought about the ways genealogists want genealogy software to improve. Most genealogy programs available today are conclusion-based. You enter your facts and attach the sources to them. This is inefficient and de-emphasizes the importance of documenting your sources. An alternative is source-based genealogy (Kessler, Inventing Source-based Data Entry, 2013), where the source data can first be entered, and then the conclusions and facts that arise from the sources can be assembled. A future GEDCOM standard needs to allow both a conclusion view and a source-based view of the data.

A key aspect of this, missing from GEDCOM, is that conclusions and sources must be separated. Source information must be just the facts, and contain no subjective information. (Kessler, Nine Necessities in a GEDCOM Replacement, 2013) Conclusion data can contain the subjective information and should point to the sources that provide the conclusions.

What this would allow would be two-fold. It not only would delineate the information that should and should not be stored with the source information in GEDCOM. But it would also create a source information standard that potentially could be used by all repositories to produce searchable files that index all their source material. Genealogists would be able to search and extract source material relevant to their research and import them directly into the family information.

All these considerations have gone into this article, to suggest how sources should be handled in a new genealogy data communication standard.

 

Definition of Citation

There are two interpretations possible for what a citation is. A citation can thought of as the source data itself, or it can be thought of as the formatted representation of some source data. The former is the data. The latter is just formatting rules for the data.

Throughout this article, my interpretation is the latter, and when I’m referring to “citation”, I am meaning the formalized representation of the source data (or source details) according to some methodology that has rules on how to format different types of sources.

Thanks go to Tom Wetmore and Richard Smith for documenting the confusion with regards to this. (Smith, 2014; Wetmore, 2014)

 

1 GEDCOM’s Current Source Definition

The existing GEDCOM standard has provided for documentation of sources. As the standard was revised over the years, FamilySearch experimented with various forms and finally came up with what was used in the GEDCOM 5.5 standard and also in the GEDCOM 5.5.1 draft. (Jones, FamilySearch GEDCOM Specifications, 2014)

What is not universally realized is that GEDCOM already contains the structures necessary to record the information about almost any type of source.

GEDCOM divides the source information into four structures.The hierarchy is this:

A. Some conclusion has its source described with a SOURCE_CITATION.

B. The SOURCE_CITATION refers to a SOURCE_RECORD.

C. The SOURCE_RECORD includes a SOURCE_REPOSITORY_CITATION.

D. The SOURCE_REPOSITORY_CITATION refers to a REPOSITORY_RECORD.

Let’s take a look at them one by one.

A. The SOURCE_CITATION is described in GEDCOM as:

“The <<SOURCE_CITATION>> structure is placed subordinate to the fact being cited. It is generally best if the source citation contains only information specific to the fact being cited and then points to the more general description of the source, defined in a SOURCE_RECORD. This reduces redundancy, provides a way of controlling the GEDCOM record size, and more closely represents the normalized data model.”

The SOURCE_CITATION is somewhat misnamed. It simply provides the specific location within the source where the reference can be found along with details about the information. It does not attempt to create a citation (i.e. some formalized bibliographic description of the source cited) but just provides the data that is necessary to create the citation. It would more aptly be named source_detail, or source_reference or even “evidence” since it details the reference to the source of some evidence.

Its GEDCOM definition is:

SOURCE_CITATION:=
    n SOUR @<XREF:SOUR>@ {1:1} /* pointer to source record */
        +1 PAGE <WHERE_WITHIN_SOURCE> {0:1}
        +1 EVEN <EVENT_TYPE_CITED_FROM> {0:1}
            +2 ROLE <ROLE_IN_EVENT> {0:1}
        +1 DATA {0:1}
            +2 DATE <ENTRY_RECORDING_DATE> {0:1}
            +2 TEXT <TEXT_FROM_SOURCE> {0:M}
                +3 [CONC|CONT] <TEXT_FROM_SOURCE> {0:M}
        +1 <<MULTIMEDIA_LINK>> {0:M}
        +1 <<NOTE_STRUCTURE>> {0:M}
        +1 QUAY <CERTAINTY_ASSESSMENT> {0:1}

B. The SOURCE_RECORD is described in GEDCOM as:

“The SOURCE_RECORD structure was simplified into five basic sections: data or classification, author, title, publication facts, and repository. The data or classification section contains facts about the data represented by this source and is used to analyze the collection of sources that the researcher used. The author, title, publication facts, and repository sections provide free-form text blocks that inform subsequent researchers how to access the source data that the original researcher used.”

Its GEDCOM definition is:

SOURCE_RECORD:=
    n @<XREF:SOUR>@ SOUR {1:1}
        +1 DATA {0:1}
            +2 EVEN <EVENT_RECORDED> {0:M}
                +3 DATE <DATE_PERIOD> {0:1}
                +3 PLAC <SOURCE_JURISDICTION_PLACE> {0:1}
            +2 AGNC < RESPONSIBLE AGENCY> {0:1}
            +2 <<NOTE_STRUCTURE>> {0:M}
        +1 AUTH <SOURCE_ORIGINATOR> {0:1}
            +2 [CONC|CONT] <SOURCE_ORIGINATOR> {0:M}
        +1 TITL <SOURCE_DESCRIPTIVE_TITLE> {0:1}
            +2 [CONC|CONT] <SOURCE_DESCRIPTIVE_TITLE> {0:M}
        +1 ABBR <SOURCE_FILED_BY_ENTRY> {0:1}
        +1 PUBL <SOURCE_PUBLICATION_FACTS> {0:1}
            +2 [CONC|CONT] <SOURCE_PUBLICATION_FACTS> {0:M}
        +1 TEXT <TEXT_FROM_SOURCE> {0:1}
            +2 [CONC|CONT] < TEXT_FROM_SOURCE > {0:M}
        +1 <<SOURCE_REPOSITORY_CITATION>> {0:1} /* substructure */ 
        +1 <<MULTIMEDIA_LINK>> {0:M}
        +1 <<NOTE_STRUCTURE>> {0:M}
        +1 REFN <USER_REFERENCE_NUMBER> {0:M}
            +2 TYPE <USER_REFERENCE_TYPE> {0:1}
        +1 RIN <AUTOMATED_RECORD_ID> {0:1}
        +1 <<CHANGE_DATE>> {0:1}

This source record looks quite comprehensive unto itself. You can clearly see where the Author, Title, Publication and Agency is intended to go. There’s plenty more included with that.

Note the SOURCE_REPOSITORY_CITATION is included as a substructure.

C. The SOURCE_REPOSITORY_CITATION is described in GEDCOM as:

“This structure is used within a source record to point to the name and address record of the holder of the source document.”

Its GEDCOM definition is:

SOURCE_REPOSITORY_CITATION:=

n REPO @<XREF:REPO>@ {1:1} /* pointer to repository record */
    +1 <<NOTE_STRUCTURE>> {0:M}
    +1 CALN <SOURCE_CALL_NUMBER> {0:M}
        +2 MEDI <SOURCE_MEDIA_TYPE> {0:1}

D. The REPOSITORY_RECORD is described in GEDCOM as:

“Formal and informal repository name and addresses are stored in the REPOSITORY_RECORD.”

Its GEDCOM definition is:

REPOSITORY_RECORD:=
    n @<XREF:REPO>@ REPO {1:1}
        +1 NAME <NAME_OF_REPOSITORY> {0:1}
        +1 <<ADDRESS_STRUCTURE>> {0:1}
        +1 <<NOTE_STRUCTURE>> {0:M}
        +1 REFN <USER_REFERENCE_NUMBER> {0:M}
            +2 TYPE <USER_REFERENCE_TYPE> {0:1}
        +1 RIN <AUTOMATED_RECORD_ID> {0:1}
        +1 <<CHANGE_DATE>> {0:1}

Together, these four structures provide places for all the data needed to document one’s sources.

 

2 The Underutilization of GEDCOM’s Sourcing

GEDCOM’s sourcing certainly is comprehensive. But it is also complicated. There are numerous tags and structures and substructures and linkages. The information for almost any type of source description can be recorded with it and it can be made to do so by a developer who takes the time to study the structures.

However, it is not obvious as to where a specific piece of source data should go. The documentation is less than clear, and there are only trivial examples that don’t help the developer properly understand. It is possible to store any source, but there is no unambiguous, unique way.

Some developers took advantage of GEDCOM’s sourcing, but many developers decided not to use it, or use only parts of it.

Then along came Elizabeth Shown Mills and her book Evidence Explained (Mills, 2007). This popularized the formalization of citation writing for genealogy and emphasized the use of templates to develop the sentence structure and formatting for a large number of different types of sources. Many developers adapted her templates and included them in their software to make it easier to genealogists to create formal citations for their sources.

The last version of GEDCOM was created before the concept of citation templates and there was no obvious way to export the templates or the citations into GEDCOM. So many developers who included citation templates in their product didn’t try to export them. Some, most notably RootsMagic, decided it was important to export template information, and created their own non-standard GEDCOM tags so they could export this information. They could then reimport their own exported templates, but no other program could.

The overall result is that few programs export their sources to GEDCOM in a manner that another program can properly read. This is the problem that genealogists dearly want fixed.

So GEDCOM is capable of storing almost any source, but it is complicated and unclear with nothing but trivial examples to help the user. There’s got to be a better, simpler way.

 

3 Zoteroing In On a Solution

As part of the BetterGEDCOM initiative, GeneJ Composer, the then leader of the BG initiative, indicated she used and was very impressed with the software called Zotero (www.zotero.org). Zotero is a free tool that helps to collect, organize, cite and share one’s research sources. It is available for Mac, Windows and Linux.

During the discussion on how a Better GEDCOM would be able to record sources. GeneJ then developed a list of about 100 elements that were used for all source types in Zotero. These included items such as: abstractNote, accessDate, applicationNumber, archive, archiveLocation, artworkMedium, artworkSize, assignee, audioFileType, audioRecordingFormat, author, billNumber, blogTitle, bookTitle, callNumber, etc. (testuser42, 2011)

GeneJ provided an example of a Zotero source for a Blog Post, as shown below (Composer, 2011). This particular type of source (blogPost) has 15 elements, and 10 of those elements have values.

image

What is very important to note here is that this is the data needed to describe a source of the type: blogPost in Zotero. This data is not formatted. It is the raw data and using this data, any method of formatting using any style can be used to present this data as a citation.

Zotero lists dozens of different types of sources (which Zotero calls item types), e.g. artwork, audioRecording, bill, blogPost, book, etc. Each source type has its own set of relevant elements that are needed. These come from the master list of elements. The specific elements needed depend on the type of source.

The Zotero software thus provides an example of a simple and workable source definition structure that would work for genealogy source data.

I am not suggesting that the Zotero’s source types and elements be the master list for the new standard. I am simply using Zotero as an illustrative example of how these source types and elements can be set up. When the standard is developed, the list should attempt to contain every item that will be needed to document sources. Don’t worry. We aren’t talking thousands. We’re talking a few hundred.

As another more concrete example, instead of Zotero’s definitions, we can use the actual Evidence Explained definitions. Tamura Jones pointed out (Jones, Genealogy Citation Standard, 2011) that John H. Yates released free open source EE-style templates. (Yates, 2010) If you look at those templates, you will see 170 categories (what I call source types) and 592 Fields (what I call elements). Many are multiple versions of the same field, maybe in short and long form, first, middle and last name, or parts of an address.

The number of elements can also be reduced by changing "blogTitle" and "bookTItle" and “articleTitle” all to "Title" and use the source type to properly context them. It will be up to the standards committee to attempt to sort those out and come up with the best set of values.

 

4 Don’t Mix Up Data with Formatting

A new GEDCOM standard should transfer only the genealogy data. That genealogy data includes all the source information necessary to accurately describe a source.

The structured formalized notation for representing the data is not data. This is just a set of instructions telling you one specific way of displaying the data. These are nothing more than formatting rules.

A new GEDCOM standard should not transfer formatting information. Formatting should be left up to the receiving program. The receiving program may have its own preferred way of formatting sources. If they use Evidence Explained, then so be it. They may interpret EE differently than another program, and they should be allowed to do so their own way, and display it their own way.

A program may give you many alternative methods of formatting, e.g. Richard Lackey[4] or even bibliographic methods such as APA[5] or Chicago[6]. Again, it should be up to the program, and not up to the sending program to force its formatting upon another.

Even within one method, there may be many different ways to format a single source. Some examples include formatting for a bibliography, for a footnote, for an endnote or for an ibid.

This will be a controversial opinion, but a line must be drawn. Information should be the only thing transferred. One program should not tell another program how it should format and display that information. Structuring and formatting information should not be transferred.

The beauty in the variety of genealogy software is that they display your data in different ways. Some people like it one way. Some people like it another way. Forcing display of data in certain ways only restricts the choice.

 

5 Suggesting a Solution

The goal of a new Standard is that data transfers seamlessly between programs. For that to be done, all developers must adhere to the standard. The way to maximize the likelihood that developers can and will adhere is by making the standard simple and unambiguous.

Source data lends itself to a simple system. All that needs to be done is:

1. Identify the most common source types that genealogists will encounter, and make part of the standard.

2. Identify all the source possible source elements and make them keys in the new standard.

3. Discourage, but allow the programmer to define their own source and source element types.

Using a GEDCOM-like definition, this structure may be as simple as:

SOURCE_RECORD:=
    n @<XREF:SOUR>@ SOUR {1:1}
        +1 TYPE <SOURCE_TYPE> {0:1}
        +1 ELEM <SOURCE_ELEMENT_AND_VALUE> {0:M}

SOURCE_TYPE:=
    [artwork | audioRecording | bill | blogPost | book | … | _<user defined>]

SOURCE_ELEMENT_AND_VALUE:=
<SOURCE_ELEMENT_TYPE>: <TEXT>

SOURCE_ELEMENT_TYPE:=
    [abstractNote | accessDate | applicationNumber | archive | archiveLocation
     | artworkMedium | artworkSize | assignee | audioFileType
     | audioRecordingFormat | author | billNumber | blogTitle | bookTitle
     | callNumber | … | _<user defined>]

Using the blog post example, data transfer in a GEDCOM-like format would look like this:

0 @S123@ SOUR
1 TYPE BlogPost
2 ELEM Title: They Came Before: Technophoo. Have no fear …
2 ELEM Author: Genej, (first)
2 ELEM BlogTitle: They Came Before
2 ELEM Date: 7 Sep 2011
2 ELEM URL: http://theycamebefore.blogspot.com/2011/09/technophoo-have-no-fear.html
2 ELEM Accessed: 22 Dec 2011 16:51:57
2 ELEM ShortTItle: They Came Before
2 ELEM DateAdded: 22 Dec 2011 16:51:57
2 ELEM Modified: 22 Dec 2011 16:51:57

And we’re done. All data will export and import easily and will transfer properly.

Now obviously there may be some minor refinements to this, such as requiring that certain elements be certain data types. Most will be text, but a few might be dates or numbers.

Also, there will be a desire to allow a user defined source type or source element type. A developer may use source types or elements that are not in the standard. These will need to be identified possibly with a leading underscore as suggested above, to emphasise to the developer that these are a fields that other programs will not understand. For example:

0 @S124@ SOUR
1 TYPE _PostItNote
2 ELEM _Handwriting: No one will be able to interpret this.

Use of user defined identifiers should be discouraged. If a program needs to use one, there should be a venue through which the developer could apply to get a new identifier added to the next version of the standard.

 

6 What about Repositories?

Repository information can be stored as source elements, as suggested above. Or it can be given its own record structure, and the source can link to the repository as GEDCOM does today.

The advantage in keeping repository information separate is less repetition of information between sources, less chance of conflicting information about one repository being included in different sources, and better backwards compatibility with GEDCOM today.

Whether to keep sources and repositories together or separate is up to FHISO to decide. Should they be kept separate, the repository still can be set up in a similar manner to sources, with repository types and repository elements. Doing so would allow easier citation template development, as will now be described.

 

7 The Case for Citation Templates

There is still a place for citation templates. And yes, it would be nice if these are standardized. This would help programmers so that they can implement the various citation styles in a consistent manner and can display your sources according using your favourite style.

There have been previous attempts to standardize citations. In 2011, Real-Time Collaboration (the creator of AncestorSync) started an initiative called SourceTemplates (Jones, The SourceTemplates Initiative, 2011). They had the cooperation of BetterGEDCOM. However their citation model was essentially the same as GEDCOM’s source structures with a DataField record for defining the source elements. So it had the same complexity as GEDCOM’s sourcing and the initiative never got off the ground.

There’s a much better way to do this. By using the source types in combination with the source elements, it would be possible to develop templates for each source type, for every bibliographic style.

Here are two template examples for a source type of BlogPost:

Using the MLA style[7], a template for a BlogPost might be:
   $Author. “$Title.” $BlogTitle. $Publisher, $DateModified. Web. $DateAccessed
Inserting our sample data, this would display as:
    Genej, (first). “They Came Before: Technophoo. Have no fear …” They Came
             Before. 22 December 2011. Web. 22 December 2011.

Whereas using Evidence Explained, a template for a BlogPost in a footnote might be:
    $Author, “$Title,” $BlogTitle, $DateModified ($Url : accessed $Accessed)
And with our sample data, this would display our as:
    Genej, (first), “They Came Before: Technophoo. Have no fear …” They Came
            Before, 22 December 2011 http://theycamebefore.blogspot.com/2011/09
            /technophoo-have-no-fear.html accessed 22 December 2011)

So you can see that the development and use of citation templates is not a difficult task once all the source types and source elements are defined. If one standard set of citation templates was developed for every combination of source type, citation methodology and entry type, with translations into different languages, then genealogy software developers would have a great resource they could use. Their programs could use the templates to format citations in a standardized manner. They can add their own unique preferred formats. And they can allow users to add their own templates.

Thus, the development of an extensive set of citation templates would add consistency in how a particular style is displayed in different programs. It would save programmers the hassle of figuring out the details of each style for themselves.

What we have done is completely separated the definition of the source data from the definition of the formatting of that data provided by the citation templates. They are now two separate tasks.

Attempting to include these templates initially into the new GEDCOM standard would be a mistake. The GEDCOM standard is designed to transfer genealogical data correctly. This should be FHISO’s main goal as they embark on their endeavour to create this new standard. They should not be distracted by a desire to standardize the formatting of the data.

So FHISO should concentrate on defining the source information and leave the citations/templates for later or for someone else to do.

 

8 Separating Sources from Conclusions

Source information must be “just the facts”. There must be no assumptions or conclusions or assessments of the source in the source information. (Kessler, Separation of Sources from Conclusions, 2011)

All assumptions and conclusions and assessments of the source must be placed with the source reference, not with the source. So the reference would be:

SOURCE_REFERENCE:=
    n SOUR @<XREF:SOUR>@ {1:1} /* pointer to source record */
        +1 <<NOTE_STRUCTURE>> {0:M}   /* assumptions and conclusions */
        +1 QUAY <CERTAINTY_ASSESSMENT> {0:1}

This is very important. The source structure must be a complete independent entity that can be used simply to identify the material where a conclusion came from.

Doing so will allow repositories to use this part of the new standard as the format for cataloguing their source information in a standardized manner, compatible with the new genealogical data transfer standards. Genealogy software would be able to read these files and search and download the sources relevant to the user.

Genealogists would easily be able to keep their own libraries of interesting sources and share them with others. They could be volunteers to catalogue the source information for repositories and even contribute their own source libraries to the world’s knowledgebase.

This would open up new possibilities for genealogy data sharing and data exchange. (Kessler, Vision, 2011)

 

Summary and Recommendation

The current version of GEDCOM has extensive sourcing capabilities. However, they are complicated to interpret and use. A simpler method is needed.

FHISO should develop a set of standard source types and source element types.

FHISO should use a simple mechanism to transfer the source element values in the standard they will develop.

FHISO can allow, but should discourage user defined identifiers. FHISO should accept requests for new identifiers to be added to a future version of the standard.

FHISO should decide if sources and repositories be defined just by a source record, or if there should be an additional repository record as well.

Citation templates are not data. They are formatting.

Citations templates must not be transferred with the source data. Instead, the programs should allow the user to format their citations their way.

Developing citation templates for all the various methodologies is a desirable task, but less important than providing a standard for the transfer of genealogy data.

FHISO should first develop a standard to transfer genealogy data. They should not initially distract themselves from their main goal by attempting to also standardize citation templates. That can be left for later.

Conclusions must be left out of the source details.

The source standard created should work for transferring sources between genealogy software and also for recording source information by repositories.

Works Cited

Composer, G. (2011, December 27). Zotero blogPost graphic-example. Retrieved from BetterGEDCOM Wiki: http://bettergedcom.wikispaces.com/file/detail/Zotero_blogPost_graphic-example.png

Jones, T. (2011, June 27). Genealogy Citation Standard. Retrieved from Modern Software Experience: http://www.tamurajones.net/GenealogyCitationStandard.xhtml

Jones, T. (2011, October 5). The SourceTemplates Initiative. Retrieved from Modern Software Experience: http://www.tamurajones.net/TheSourceTemplatesInitiative.xhtml

Jones, T. (2014, August 21). FamilySearch GEDCOM Specifications. Retrieved from Modern Software Experience: http://www.tamurajones.net/FamilySearchGEDCOMSpecifications.xhtml

Kessler, L. (2011, December 16). Separation of Sources from Conclusions. Retrieved from BetterGEDCOM Wiki: https://bettergedcom.wikispaces.com/share/view/48324558

Kessler, L. (2011, August 13). Vision. Retrieved from BetterGEDCOM Wiki: http://bettergedcom.wikispaces.com/Vision

Kessler, L. (2013, July 29). Inventing Source-based Data Entry. Retrieved from Louis Kessler’s Behold Blog: http://www.beholdgenealogy.com/blog/?p=1321

Kessler, L. (2013, June 5). Nine Necessities in a GEDCOM Replacement. Retrieved from Paper 78 submitted to FHISO’s call for papers, Necessity #1: Separation of Sources from Conclusions: http://fhiso.org/files/cfp/cfps78.pdf

Mills, E. S. (2007). Evidence Explained: Citing History Sources from Artifacts to Cyberspace. Baltimore: Genealogical Publishing Company, Inc.

Smith, R. (2014, August 29). The role of email, attachments, slack, github, etc, in FHISO’s work. Retrieved from TSC-public mailing list archives: http://fhiso.org/pipermail/tsc-public_fhiso.org/2014/000117.html

testuser42. (2011, December 19). List of main Citation Elements. Retrieved from BetterGEDCOM Wiki: http://bettergedcom.wikispaces.com/List+of+main+Citation+Elements

Thorud, G., Composer, G., & Hatchett, A. (2012, March 5). Sources and Citations. Retrieved from BetterGEDCOM Wiki: http://bettergedcom.wikispaces.com/page/history/Sources+and+Citations

Wetmore, T. (2014, August 29). The role of email, attachments, slack, github, etc, in FHISO’s work. Retrieved from TSC-public mailing list archives: http://fhiso.org/pipermail/tsc-public_fhiso.org/2014/000102.html

Yates, J. H. (2010, February). Two Computer Ready Parametrizations of "Evidence Style" Historical Sources. Retrieved from http://jytangledweb.org/genealogy/evidencestyle/

 


[1] http://www.beholdgenealogy.com/blog/?p=1395

[2] Behold is a program that read’s GEDCOM data files and displays all the information from them.

[3] For example, Randy Seaver, Genea-Musings: Software Programs, GEDCOM Files and Source Citations – Some Recommendations, February 17, 2011, http://www.geneamusings.com/2011/02/software-programs-gedcom-files-and.html

[4] Cite Your Sources, paperback, June 1, 1978, Amazon.com http://www.amazon.com/Cite-Your-Sources-Richard-Lackey/dp/9995236478

[5] http://www.apastyle.org/ - Note that APA is being used for this paper.

[6] http://www.chicagomanualofstyle.org/

[7] Citesource: MLA Style – Blog Post. http://citesource.trincoll.edu/mla/mlablogpost_002.pdf

14 Comments           comments Leave a Comment

1. Tony Proctor (acproctor)
Ireland flag
Joined: Wed, 8 Aug 2012
4 blog comments, 0 forum posts
Posted: Thu, 28 Aug 2014  Permalink

I agree with some of your main points here Louis. In particular: ‘data is not formatting’ and ‘templates are not part of the data’. However, my approach is significantly different in places too. I have a half-finished article I will eventually publish as part of the STEMMA documentation, but in summary:

1) Reference-note citations are not synonymous with footnotes/endnotes. In other words, footnotes/endnotes may be used for other purposes and so require an explicit mechanism.
2) Real citations also need to accommodate discursive notes. I use (1) to distinguish between the core citation and the discursive notes that accompany it.
3) I believe in a predefined set of source-types but absolutely believe it must be extensible. Real sources won’t be limited by any master source list.
4) The names of the citation elements should not be standardised (I’ve likened this to a database with fixed column names). Semantic types for the element names should be provided in the source-type definition.
5) Zotero isn’t the only template system and so the data should be neutral. A product should be able to interface the FHISO data to the template system of its choice.
6) The scheme must accommodate ‘layered citations’, and ’source of source’ (incl. repositories). I have used citation chains as it helps you find related citations.

Tony

2. Enno Borgsteede (ennoborg)
Netherlands flag
Joined: Wed, 9 May 2012
6 blog comments, 0 forum posts
Posted: Thu, 28 Aug 2014  Permalink

About #4: In an international environment, which is what the FHISO says to address, citation elements MUST be standardized, just like the existing GEDCOM tags. As native English speakers, you both seem to forget about that, but IMO it is essential for a standard to work. Like the fixed column names in a database, they are essential for citations to be understood by fellow genealogists on my continent. In the Gramps development group, people understand English to the extent that they can communicate about technical things, as a genealogist, I demand to see auteur where you see author, titel where you see title, and so forth.

This is a major problem in EE, which Tamura Jones called to be awfully Americentric, and it is a problem in GeneJ’s example too. And if BibTeX and other citation standards can use a reasonably sized controlled vocabulary, genealogists can do that too. The problem is that many English speaking genealogist are so provincial that they don’t even think about the dozens of other languages in the world. IMO, using English citation elements is not open, not balanced, just extremely selfish.

A controlled vocabulary can be reduced to a reasonable size by not using elements like blog title, because such elements lead to an explosion of terms. Same for elements in censuses, which may differ with the census year, and country too. When the type of a source is a blog, one can say that the title of the collection can be generated by putting the source type in front of the generic and controled term for title, which can then be translated. And note that where I write in front of, that itself may be wrong in right to left languages like Arab and Hebrew, which are both supported by Gramps. Anyway, as far as I’m concerned this is essential for the international acceptance of a standard, and it is a practical point too. The number of elements used here, or in EE, is so large, that no developer in the world would even think about translating them. And this works both ways, because if we can’t agree on a limited of generic elements, you will have to translate our hundreds of local elements too. Would you really want that? It is balanced, but given that many developers are smal companies, or groups like Gramps’, it looks totally unrealistic to me.

3. Enno Borgsteede (ennoborg)
Netherlands flag
Joined: Wed, 9 May 2012
6 blog comments, 0 forum posts
Posted: Thu, 28 Aug 2014  Permalink

Oh BTW, something’s weird in the registration. Tony appears to be from Italy, and I’m from Poland. :-)

4. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Thu, 28 Aug 2014  Permalink

Tony,

I mostly agree with you.

1. My paper is regarding source references only. When footnotes/endnotes are used for other purposes such as supplementary information, then you are correct that they use a different structure. I would suggest that they can be easily handled by using a note subordinate to a note, e.g.: 1 BIRT 2 NOTE information 3 NOTE supplementary info.

2. I’m not really sure what you mean by discursive notes. If you are talking about the genealogist’s interpretation of the source material, it should not be in these source definitions. It is a conclusion and is part of the source reference. I will update section 8 of the article to make that clearer.

3. As soon as you make them extensible, then you lose transferability. Another program won’t understand the custom source-type. Even if all the formatting information was transferred, the receiving program would still be “dumb” about it. I highly recommend that custom source types only be a temporary thing, and that developers need to prove the need for them to a standards committee.

4. Similarly to #3, they must be standardized so they can be identified. It does not matter what they are called in the standard as long as it’s understandable and unambiguous to all developers.

5. Thanks. I may not have made it clear that I was just using Zotero as an example. I shall clarify.

6. Yes, I agree that sources can have sources, and that they should be documented. E.g., A summary of deaths in a town states that it came from the town’s register. The source’s source can be an element and its value would be a pointer to another source.

Louis

5. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Thu, 28 Aug 2014  Permalink

Enno,

Re #4: An “author” or “auteur” or “автор” all mean the same thing. But like I said for Tony, the standard needs to just use one term that the programmer will best understand. But it does not matter what it is called in the standard. It could be called “XYXXXZZ, as long as the meaning of it and how the program must interpret it is clear to the developer.

Displaying the data is a different matter, and I completely agree with you regarding the display of data. When creating citation templates, the “author” or “XYXXXZZ” element can be displayed in any language or in any manner that the developer wants. In Behold, I allow the user to define the text that is displayed for every GEDCOM tag and every other phrase that is displayed. Users have created templates in different languages for Behold so that the Everything Report is displayed completely as they want. And there are hundreds of tags and terms. The same can be done for citation templates.

I agree as you say that the number of elements can be reduced by changing “blogTitle” and “bookTItle” to “Title” and use the source type to properly context them. It will be up to the standards committee to attempt to sort those out and come up with the best set of values.. I’ll add this into the paper.

Louis

6. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Thu, 28 Aug 2014  Permalink

Enno,

Thanks for pointing out the country problem. I’m using ip2nation.com’s database and I haven’t updated it in a while. I hadn’t noticed too many problems before. I’ll try to update it when I get a chance.

In the meantime, enjoy your stay in Poland. :-)

Louis

7. Enno Borgsteede (ennoborg)
Netherlands flag
Joined: Wed, 9 May 2012
6 blog comments, 0 forum posts
Posted: Fri, 29 Aug 2014  Permalink

Louis, thanks for elaborating things in your text. You’d probably need to buy a new marker if it were real.

8. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Fri, 29 Aug 2014  Permalink

Enno,

Thank you for your comments. They explain to me what I am not making clear and what I need to emphasize.

And there is something still caching your country wrong. I shall find it.

Louis

9. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Fri, 29 Aug 2014  Permalink

Enno,

Got your countries fixed. The cache was stuck. Took me a while to find where it was hidden, but I did and cleaned it out.

Louis

10. Enno Borgsteede (ennoborg)
Netherlands flag
Joined: Wed, 9 May 2012
6 blog comments, 0 forum posts
Posted: Sat, 30 Aug 2014  Permalink

Louis, thanks. I see that Tony has returned from Italy too.

You may have seen that on the FHISO list, I criticized the use of source types as a selector for templates. You mention 170 types in the Yates EE-style templates, and I wrote that ESM lists 1100+ record types on the EE site, and assuming that those are dominated by US records, I feel quite safe to say that world wide source types would add up to thousands, and unless you feel like translating Linnaeus’ works for the whole world yourself, I see no way to implement such a type list for the whole world. And given the number of available of translators for Dutch in Gramps, I can guarantee that we will never reach a consensus about such types.

The problem is that ESM and other genealogists that don’t think much about structuring information tend to create a separate type for every record that they find in the world. That’s like creating a type for every breed of dogs. And that won’t work here. There are loads of breeds for birth records alone, because each country does things different, and breeds change when new types of records are introduced over time. Therefore, trying to standardize on record breeds found in the wild, which is basically would you be doing then, is an impossible task for developers like us.

I think that a large portion of this permutation problem can be solved by following Tony’s advice #6, which is layered citations or source of source, which is already defined in GedcomX. With that, you don’t need a type for a blog post anymore. Such a post is simply an article with a title and a date (no author), that is embedded in a higher level source, which is the blog, that has a title of its own, and an author, who is inherited in the article. For an article in a magazine, it is quite likely that the article has an author of its own, and the magazine’s author is an editor, and there is a publisher too, which is quite easy to implement then. And with layered citations, it’s also much easier to record separate URLs for the blog and the post, because you can use the same element for that.

When layered citations are used, I don’t think there’s much need for source types. As I wrote on the FHISO list they are quite impractical, because once you reached consensus about one, next day someone will find a record in the wild that needs an element we didn’t think of. Such record types are moving targets, so they simply can’t be standardized.

Many times, I think that source types are actually media types, and for those standards exist in several places. An article (text) is a media type, and so is a blog (web site). And when you see that a text was found on a site, you know through inheritance that the text is not a printed one. And in fact, when you find an article in a magazine on DVD, this article, although initially published in print, is now an electronic one, which may be be image, text, or both.

Main thing is that if we can’t control the explosion of types through permutation, there is no way to standardize types either, because we don’t have the resources for the translation work. And I get quite mad when I see that people like ESM and GeneJ seem to ignore that fact. You all need to understand that I for one will not accept such unfair acts.

11. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Sun, 31 Aug 2014  Permalink

Enno,

Using source types based on generalized media types as you mention is one way to do it that would work. I agree that going to the micro level is way too much and unnecessary. Doing so can be done by those who create the citation templates as a separate step from the creation of the data transfer standards. If their templates become unwieldy, then no one will use them and will go to another set. The translation would be part of the template work, not the data transfer work.

Louis

12. Enno Borgsteede (ennoborg)
Netherlands flag
Joined: Wed, 9 May 2012
6 blog comments, 0 forum posts
Posted: Sun, 31 Aug 2014  Permalink

One difficult thing here is that users of products like RootsMagic, and especially those who have expanded the included set of 400+ templates to the 1100+ mentioned in EE, will expect that they’re supported by the standard, while at the same time our costs to support those are too high, like I already found out for the Yates templates in Gramps.

So yes, unwieldy is a factor here, and may be a factor in other domains too, like event and place types.

13. Jan Murphy (janmurphy)
United States flag
Joined: Sat, 28 Dec 2013
2 blog comments, 0 forum posts
Posted: Tue, 9 Sep 2014  Permalink

Re: Section 6 What about Repositories?: We also need layers to sort out the confusion about what a repository is.

I got into an argument on another venue about whether Ancestry.com or similar websites are repositories are not. If I recall correctly, Elizabeth Shown Mills has stated that a website is not a repository (probably seen in Evidence! — I can hunt down a specific citation if necessary). I might agree with that for a personal website or even a large website for a One-Name or One-Place Study, which are digital equivalents to large published works. But the websites of large genealogy vendors like Family Search, FindMyPast, Ancestry, World Vital Records, and the like, act like digital libraries where researchers “visit” in order to access materials.

My fellow user argued strongly in favor of Mills’ view that Ancestry.com is a Publisher, not a Repository, but practically speaking, Ancestry the company performs both roles. They don’t just put data up on the web for all to see — they also perform administrative duties like keeping track of member IDs and passwords. Issuing researcher credentials and allowing or disallowing entry to the collections are acts which are performed by the staff at repositories. These are not tasks that are usually performed by publishers.

14. Louis Kessler (lkessler)
Canada flag
Joined: Sun, 9 Mar 2003
150 blog comments, 210 forum posts
Posted: Thu, 11 Sep 2014  Permalink

Jan,

I’m happy enough to ultimately let FHISO define what a Repository is. The cataloging and indexing of the raw source materials can ultimately be done by the staff a the repositories, the staff of the publishers or crowd-sourced by the users - whatever works best.

The key point is that if the method to detail a source document is defined with the goal of doing this, then it will open up amazing new avenues for easily using and transferring source information.

Louis

 

The Following 3 Sites Have Linked Here

  1. Best of the Genea-Blogs - 24 to 30 August 2014 - Genea-Musings (Randy Seaver) : Sun, 31 Aug 2014
    Louis has a FHISO proposal to deal with source citations. Bravo!

  2. Genealogy Source Citations and FHISO - A Simpler Proposal - Genea-Musings (Randy Seaver) : Wed, 3 Sep 2014
    Louis Kessler has submitted a paper concerning sources and citations...

  3. GEPS 018: Evidence style sources - Gramps wiki : Thu, 20 Nov 2014
    Further Reading ... Standardizing Sources and Citation Templates, Louis Kessler’s Behold Blog

Leave a Comment

You must login to comment.

Login to participate
  
Register   Lost ID/password?