GEDCOM
View on Wikipedia| GEDCOM | |
|---|---|
| Filename extension | .ged |
| Internet media type | |
| Developed by | LDS FHD |
| Initial release | 1984 |
| Latest release | 7.0.16 18 March 2025[2] |
| Type of format | Genealogy data exchange |
| Standard | De facto[3] |
| Open format? | yes |
| Website | gedcom github |
FamilySearch GEDCOM, or simply GEDCOM (/ˈdʒɛdkɒm/ JED-kom, acronym of Genealogical Data Communication), is an open file format and the de facto standard specification for storing genealogical data.[3] It was developed by the Church of Jesus Christ of Latter-day Saints (LDS Church), the operators of FamilySearch, to aid in the research and sharing of genealogical information.[4] A common usage is as a standard format for the backup and transfer of family tree data between different genealogy software and websites, most of which support importing from and exporting to GEDCOM format.[5]
GEDCOM is defined as a plain text file, using UTF-8 encoding as of version 7.0. This file contains genealogical information about individuals such as names, events, and relationships; metadata links these records together.
GEDCOM 7.0, released in 2021, is the most recent version of the GEDCOM specification as of July 2024[update].[6] However, its predecessor, GEDCOM 5.5.1, remains the industry's format standard for the exchange of genealogical data.[citation needed] First released as a draft standard in 1999, GEDCOM 5.5.1 received only minor updates in the subsequent 20 years leading up to the release of 5.5.1 final in 2019. To address its shortcomings, some genealogy programs introduced proprietary extensions to GEDCOM which are not always recognized by other programs, such as GEDCOM 5.5 EL (Extended Locations).[7][8][9] Efforts have been made to have 7.0 more widely adopted since its release. FamilySearch intends to be GEDCOM 7.0 compatible in the third quarter 2022 and Ancestry.com is planning for 7.0 compatibility, but has not yet specified an implementation date.[citation needed]
Data model
[edit]GEDCOM uses a lineage-linked data model based on the conceptual model of the nuclear family. The family (FAM) record type is therefore the only source of links between the individuals (INDI) in the file, assigning parents (as HUSB and WIFE) and children (as CHIL) by referring to individuals' unique ID numbers.[10] These historical origins are described in the 7.0 specification document: "The FAM record was originally structured to represent families where a male HUSB (husband or father) and female WIFE (wife or mother) produce CHIL (children)."[11]
Although the links in a GEDCOM family record still use the original naming indicating a husband and a wife, the specification now states that "sex, gender, titles, and roles of partners should not be inferred based on the partner that the HUSB or WIFE structure points to" and that these individuals within a family structure are collectively referred to as 'partners', 'parents' or 'spouses'. A FAM record can also be used for "cohabitation, fostering, adoption, and so on, regardless of the gender of the partners."[11]
File structure
[edit]A GEDCOM file consists of a header section, records, and a trailer section. Within these sections, records represent people (INDI record), families (FAM records), sources of information (SOUR records), and other miscellaneous records, including notes. Every line of a GEDCOM file begins with a level number where all top-level records (HEAD, TRLR, SUBN, and each INDI, FAM, OBJE, NOTE, REPO, SOUR, and SUBM) begin with a line with level 0, while other level numbers are positive integers.
Although it is possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator[12] that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView project, though it is not meant to be a standalone validator. For standalone validation "The Windows GEDCOM Validator" can be used.[13] or the older unmaintained Gedcheck[14] from the LDS Church.
During 2001, The GEDCOM TestBook Project evaluated how well four popular genealogy programs conformed to the GEDCOM 5.5 standard using the Gedcheck program.[15] Findings showed that a number of problems existed and that "The most commonly found fault leading to data loss was the failure to read the NOTE tag at all the possible levels at which it may appear."[16] In 2005, the Genealogical Software Report Card was evaluated (by Bill Mumford who participated in the original GEDCOM Testbook Project)[17] and included testing the GEDCOM 5.5 standard using the Gedcheck program.[18]
To assist with adoption of GEDCOM 7.0, validation tools now exist for that standard as well.[19]
Example
[edit]The following is a sample GEDCOM file.
The header (HEAD) includes the source program and version (Personal Ancestral File, 5.0), the GEDCOM version (5.5), the character encoding (ANSEL), and a link to information about the submitter of the file.
Key Information
The individual records (INDI) define John Smith (ID I1), Elizabeth Stansfield (ID I2), and James Smith (ID I3).
The family record (FAM) links the husband (HUSB), wife (WIFE), and child (CHIL) by their ID numbers.
Versions
[edit]The current version of the specification in wide use is GEDCOM 5.5.1 final, which was released on 15 November 2019. Its predecessor, GEDCOM 5.5.1 draft[20] was issued in 1999, introducing nine new attribute, tags and adding UTF-8 as an approved character encoding. The draft was not formally approved, but its provisions were adopted in some part by a number of genealogy programs[21][22][23] including FamilySearch.org.[20]
Lineage-linked GEDCOM is the deliberate de facto common denominator.[3] Despite version 5.5 of the GEDCOM standard first being published in 1996, many genealogical software suppliers have never fully supported the feature of multilingual Unicode text (instead of the ANSEL character set) introduced with that version of the specification. Uniform use of Unicode would allow for the usage of international character sets. An example is the storage of East Asian names in their original Chinese, Japanese and Korean (CJK) characters, without which they could be ambiguous and of little use for genealogical or historical research.[24] PAF 5.2 is an example of software that uses UTF-8 as its internal character set, and can output a UTF-8 GEDCOM.[24][25]
GEDCOM 7.0 requires UTF-8 encoding throughout,[26] and resolves other long-standing issues with GEDCOM 5.5.1. Multimedia support in the form of an associated .zip file, called a GEDZip, is another inclusion. Efforts are underway to see 7.0 embraced as the new exchange standard.[27] GEDCOM 7.0 allows explicitly identifying what standards other than GEDCOM may apply to a particular file. GEDCOM has always been extensible, but prior to 7.0 there was no standard way to identify such extensions. Also, GEDCOM 7.0 allows explicitly marking an event as nonexistent. This allows, for example, documenting that a particular individual never married.[28] GEDCOM 7.0 was the first version to use semantic versioning, and is the most recent minor version of the specification.
As of July 2024[update], the next planned minor release is v7.1, which is under development.[29]
Release history
[edit]| GEDCOM version | Release date | Notes |
|---|---|---|
| 1.0[30] | 1984[31] | – |
| 2.0[30] | Dec 1985[32] | PAF 2.0 |
| 2.1 | Feb 1987[32] | GEDCOM for PAF 2.1 |
| 2.3 Draft | 7 August 1985[33] | with PAF2.0 GEDCOM implementation conventions |
| 2.4 Draft | 13 December 1985[33] | with PAF2.0 GEDCOM implementation conventions |
| 3.0 Standard[30] | 9 October 1987[34] | PAF 2.0 and 2.1 implementation of 3.0 |
| 4.0 Standard | August 1989 | PAF 2.1 – 2.31 |
| 4.1 Draft[35] | – | – |
| 4.2 Draft[36] | 25 January 1990[37] | – |
| 5.0 Draft[30] | 31 December 1991[33] | lineage-linked structures were introduced.[38] |
| 5.1 Draft | 18 September 1992[32] | – |
| 5.2 Draft | 22 January 1992[39] | – |
| 5.3 Draft | 4 November 1993[40] | Unicode standard (ISO/IEC 10646) was introduced as an additional character set. |
| 5.4 Draft | 21 August 1995[41] | – |
| 5.5 Standard | 11 December 1995[42] | PAF 3, 4 and 5 |
| 5.5 Standard | January 2, 1996[43][44] | PAF 3, 4 and 5 / 5.5 Standard[45] |
| GEDCOM (Future Direction) Draft[38][46] | May 1, 1998[47][48] | "it used an entirely new data model"[49] |
| 5.5.1 Draft[50][51] | October 2, 1999[20] | Used by FamilySearch.org[20] UTF-8 added as an approved character encoding. |
| 5.5.1 Release[52] | November 15, 2019 | current standard, minor text modifications to 5.5.1 Draft. |
| 5.6 Private Draft | -[53] | "Jed Allen sent those two files to a few people only for sort of "private comments"[54] |
| 6.0 XML Draft | December 28, 2001[55] | Was not a complete specification, and not recommended to begin to software implementations. |
| 7.0.0-rc1 Draft | February 2021[56] | Release candidate revealed for RootsTech 2021, but then all talks, specifications and the web site were removed on 25 February 2021[57] |
| 7.0[58] | 27 May 2021 | Modernize character encoding, clarify ambiguities in 5.5.1 specification, introduce semantic versioning, improve multimedia handling |
| 7.0.13[59] | 4 August 2023 | |
Legend: Unsupported Supported Latest version | ||
Limitations
[edit]This article needs to be updated. The reason given is: This section needs to be checked against GEDCOM 7.0 for accuracy, and sourced if still true.. (July 2022) |
Support for multi-person events and sources
[edit]A GEDCOM file can contain information on events such as births, deaths, census records, ship's records, marriages, etc.; a rule of thumb is that an event is something that took place at a specific time, at a specific place (even if time and place are not known). GEDCOM files can also contain attributes such as physical description, occupation, and total number of children; unlike events, attributes generally cannot be associated with a specific time or place.
The GEDCOM specification requires that each event or attribute is associated with exactly one individual or family.[60] This causes redundancy for events such as census records where the actual census entry often contains information on multiple individuals. In the GEDCOM file, for census records a separate census "CENS" event must be added for each individual referenced. Some genealogy programs, such as Gramps and The Master Genealogist, have elaborate database structures for sources that are used, among other things, to represent multi-person events. When databases are exported from one of these programs to GEDCOM, these database structures cannot be represented in GEDCOM due to this limitation, with the result that the event or source information including all of the relevant citation reference information must be duplicated each place that it is used. This duplication makes it difficult for the user to maintain the information related to sources.
In the GEDCOM specification, events that are associated with a family such as marriage information is only stored in a GEDCOM once, as part of the family (FAM) record, and then both spouses are linked to that single family record.[60]
Ambiguity in the specification
[edit]The GEDCOM specification was made purposefully flexible to support many ways of encoding data, particularly in the area of sources. This flexibility has led to a great deal of ambiguity, and has produced the side effect that some genealogy programs which import GEDCOM do not import all of the data from a file.[61]
Ordering of events that do not have dates
[edit]The GEDCOM specification does not offer explicit support for keeping a known order of events. In particular, the order of relationships (FAMS) for a person and the order of the children within a relationship (FAM) can be lost. In many cases the sequence of events can be derived from the associated dates. But dates are not always known, in particular when dealing with data from centuries ago. For example, in the case that a person has had two relationships, both with unknown dates, but from descriptions it is known that the second one is indeed the second one. The order in which these FAMS are recorded in GEDCOM's INDI record will depend on the exporting program. In Aldfaer[62] for instance, the sequence depends on the ordering of the data by the user (alphabetical, chronological, reference, etc.). The proposed XML GEDCOM standard[55] does not address this issue either.
Lesser-known features
[edit]GEDCOM has many features that are not commonly used. Some software packages do not support all the features that the GEDCOM standard allows.
Multimedia
[edit]The GEDCOM standard supports the inclusion of multimedia objects (for example, photos of individuals).[63] Such multimedia objects can be either included in the GEDCOM file itself (called the "embedded form") or in an external file where the name of the external file is specified in the GEDCOM file (called the "linked form"). Embedding multimedia directly in the GEDCOM file makes transmission of data easier, in that all of the information (including the multimedia data) is in one file, but the resulting file can be enormous. Linking multimedia keeps the size of the GEDCOM file under control, but then when transmitting the file, the multimedia objects must either be transmitted separately or archived together with the GEDCOM into one larger file. Support for embedding media directly was dropped in the draft 5.5.1 standard.[64]
Conflicting information
[edit]The GEDCOM standard allows for the specification of multiple opinions or conflicting data, simply by specifying multiple records of the same type. For example, if an individual's birth date was recorded as 10 January 1800 on the birth certificate, but 11 January 1800 on the death certificate, two BIRT records for that individual would be included, the first with the 10 January 1800 date and giving the birth certificate as the source, and the second with the 11 January 1800 date and giving the death certificate as the source. The preferred record is usually listed first.
This example encoded in GEDCOM might look like this:
0 @I1@ INDI 1 NAME John /Doe/ 1 BIRT 2 DATE 10 JAN 1800 2 SOUR @S1@ 3 DATA 4 TEXT Transcription from birth certificate would go here 3 NOTE This birth record is preferred because it comes from the birth certificate 3 QUAY 2 1 BIRT 2 DATE 11 JAN 1800 2 SOUR @S2@ 3 DATA 4 TEXT Transcription from death certificate would go here 3 QUAY 2
Conflicting data may also be the result of user errors. The standard does not specify in any way that the contents must be consistent. A birth date like "10 APR 1819" might mistakenly have been recorded as "10 APR 1918" long after the person's death. The only way to reveal such inconsistencies is by rigorous validation of the content data.
Internationalization
[edit]The GEDCOM standard supports internationalization in several ways. First, newer versions of the standard allow data to be stored in Unicode (or, more recently, UTF-8), so text in any language can be stored.[65] Secondly, in the same way that one can have multiple events on a person, GEDCOM allows one to have multiple names for a person,[66] so names can be stored in multiple languages, although there is no standardized way to indicate which instance is in which language. Finally, in version 5.5.1, the NAME field also supports a phonetic variation (FONE) and a romanized variation (ROMN) of the name.[67]
GEDCOM X
[edit]In February 2012 at the RootsTech 2012 conference, FamilySearch outlined a major new project around genealogical standards called GEDCOM X, and invited collaboration.[68] It includes software developed under the Apache open source license. It includes data formats that facilitate basing family trees on sources and records (both physical artifacts and digital artifacts), support for sharing and linking data online, and an API.[68][69][70]
In August 2012 FamilySearch employee and GEDCOM X project leader Ryan Heaton dropped the claim that GEDCOM X is the new industry standard, and repositioned GEDCOM X as another FamilySearch open source project.[71]
After the release of GEDCOM 7, FamilySearch positioned GEDCOM X as useful for interoperation with its FamilySearch Family Tree software.[72]
Alternatives
[edit]Commsoft, the authors of the Roots[73] series of genealogy software and Ultimate Family Tree, defined a version called Event-Oriented GEDCOM (also known as "Event GEDCOM" and originally called InterGED[74]),[75] which included events as first class (zero-level) items. Although it is event based, it is still a model built on assumed reality rather than evidence. Event GEDCOM was more flexible, as it allowed some separation between believed events and the participants. However, Event GEDCOM was not widely adopted by other developers due to its semantic differences.[citation needed] With Roots and Ultimate Family Tree no longer available, very few people today are using Event GEDCOM.[76]
Gramps XML is an XML-based open format created by the open source genealogy project Gramps and used also by PhpGedView.
The Family History Information Standards Organisation was established in 2012 with the aim of developing international standards for family history and genealogical information.[77] One of the standards the organization proposed was Extended Legacy Format (ELF), compatible with GEDCOM 5.5(.1), but including an extensibility mechanism. The organization requested public comment on the proposed standard in 2017.[78] It withdrew the proposal because release 7.0 of GEDCOM addressed many of the organization's concerns.[28]
See also
[edit]- FamilySearch
- GENDEX – Genealogical index
- Genealogical numbering systems
- GNTP – Genealogy Network Transfer Protocol
- Tiny Tafel Format – encoded "ancestor table"
- List of genealogy databases
Notes
[edit]References
[edit]- ^ a b Clarke, Gordon (2021-12-07). "Media subtype name: vnd.familysearch.gedcom+zip". Internet Assigned Numbers Authority. Retrieved 2022-10-01.
- ^ "Releases · FamilySearch/GEDCOM". GitHub. Retrieved 21 April 2025.
- ^ a b c Subject: GEDCOM and Formal Standards Organizations Date: Wed, 24 Jan 1996 11:53:52 -0700 From: Bill Harten – Organization: Brigham Young University "why wasn't GEDCOM developed through a formal standards organization?..."Thus GEDCOM was born as a deliberate, de facto standard, to be followed only by those who felt it was in their best interest to do so.
- ^ Subject: rep: T Jenkins – open letter to GEDCOM-L – "The goal was to try and provide a standard to allow developers to provide a vehicle for their users to share genealogical conclusions and supporting evidence with others." From: "Jed R. Allen" Brigham Young University – Date: 29 Sep 1995 17:40:04 -0600 – GEDCOM-L Archives – September 1995, week 5 (#7)
- ^ "Genealogical Software Report Card". March 2005. Archived from the original on 2009-02-11.
- ^ "The FamilySearch GEDCOM Specification". gedcom.io. 8 February 2024. Retrieved 10 July 2024.
- ^ GEDCOM 5.5 EL Archived 2020-01-11 at the Wayback Machine (Extended Locations) specification
- ^ Ability to save information against places Archived 2020-01-24 at the Wayback Machine – "Support for parts of the GEDCOM 5.5EL proposal" – FHUG Wish List
- ^ 0000688: Support for Gedcom 5.5EL Archived 2011-07-26 at the Wayback Machine – Gramps Bugtracker
- ^ "The GEDCOM Standard Release 5.5: Data Model Chart". homepages.rootsweb.com. Retrieved 2022-07-21.
- ^ a b "The FamilySearch GEDCOM Specification". gedcom.io. 2022-06-07. Retrieved 2022-07-21.
- ^ "View of phpgedview's GEDCOM validator source code".[permanent dead link]
- ^ "VGed 3.02". Rumble Fische. Archived from the original on January 6, 2011.
- ^ Gedcheck Archived 2009-02-07 at the Wayback Machine – "uses a grammar file for the specific version of GEDCOM to be checked against." The Church of Jesus Christ of Latter-day Saints
- ^ "GEDCOM TestBook Project". 2001. Archived from the original on 2006-06-15.
- ^ [GEDCOM and the GenTech Testbook Project] Genealogical Computing 7/1/2001 – Archive Summer 2001 Vol. 21.1 – Ancestry.com
- ^ The Genealogical Software Report Card 2000 S W Mumford Last updated March 2005 [unreliable source?]
- ^ Reviews from the NGS Newsmagazine and its Predecessors. Archived 2009-02-12 at the Wayback Machine – Test Result are in the PDF's
- ^ "Tools for FamilySearch GEDCOM". FamilySearch GEDCOM. Retrieved 2022-07-21.
- ^ a b c d Family History Department GEDCOM Coordinator (2 October 1999). "The GEDCOM Standard: Draft Release 5.5.1" (PDF). The Church of Jesus Christ of Latter-day Saints. Retrieved 2022-10-01.
- ^ GED-GEN is based on GEDCOM version 5.5.1 (draft) Archived 2009-02-03 at the Wayback Machine, dated 2 October 1999. The following record types are parsed: header, individual, family, notes, source, and repository. However not all elements within these records are processed. – Specifications – GED-GEN Introduction
- ^ 0000688: Support for Gedcom 5.5EL Archived 2011-07-26 at the Wayback Machine(0008068) romjerome (developer) 2009-01-25 06:13 – "Note : GRAMPS 3.0.x supports a part of GEDCOM 5.5.1 on export, which is not supported by most programs" – Gramps Bugtracker
- ^ "MyBlood supports the GEDCOM 5.5 and 5.5.1 file format." Archived 2009-06-05 at the Wayback Machine – MyBlood Support – Forum, FAQ, Know Problems
- ^ a b Personal Ancestral File 5.2 and PAF Companion 5.4 – Software Version Changes Archived 2009-03-06 at the Wayback Machine Release 5.0.1.4, 22 December 2000 – "10.GEDCOM improvements: Table:Destination:PAF 5 GEDCOM Version:5.5 Character Set:UTF-8
- ^ Personal Ancestral File 5.1 Archived 2007-07-21 at the Wayback Machine – "Also noted in a second test was the use of four tags from a later draft version of the Gedcom specification, FONE (phonetic name), ROMN (romanized name), EMAIL (e-mail), and _UID" Jan/Feb 2002 NGS Newsmagazine
- ^ "The FamilySearch GEDCOM Specification". gedcom.io. 2022-06-07. Retrieved 2022-07-21.
- ^ "Implementation Progress • Genealogy". Genealogy. Retrieved 2022-07-21.
- ^ a b Smith, Richard (2021-07-21). "FamilySearch GEDCOM 7". Family History Information Standards Organisation. Archived from the original on 2021-08-04. Retrieved 2023-01-30.
- ^ "GitHub - FamilySearch/GEDCOM at v7.1". GitHub. Retrieved 10 July 2024.
- ^ a b c d pafuser : Beitrag: Re: [pafuser] PAF 5.01 und GEDCOM By Eckhard Henkel – Beitrag #103 von 1494 – Yahoo Groups
- ^ Subject:description of InterGED theory From:Gary Steiner – "The first GEDCOM standard, version 1.0, was released to the genealogical software development community in 1984." – GEDCOM-L Archives – July 1994, week 4 (#14)
- ^ a b c Subject:Timeline of GEDCOM versions and PAF By George Archer – GEDCOM-L Archives – November 2000, week 3 (#12)
- ^ a b c Subject:Re: GEDCOM standards help please From:Graham Starkey – "DRAFT VERSION 2.3–7 August 1985 with PAF2.0 GEDCOM implementation conventions" – GEDCOM-L Archives – June 2000, week 4 (#1)
- ^ RootsWeb: ROOTS-L Re: Large Charts (fairly long):Date:Tue, 11 Jul 89 15:14:31 CDT From: Marty Hoag <NU021172@N...> Subject:Re: Printing trees with PAF? From soc.roots ... * GEDCOM release 3.0, 9 Oct 1987, 131 pages (!)
- ^ "LISTSERV - GEDCOM-L Archives - LISTSERV.NODAK.EDU". listserv.nodak.edu.
- ^ File Structures for PAF and GEDCOM – Date: 1996/01/04 – soc.genealogy.computing | Google Groups:
- ^ Subject:4.x specs From:Rafal Prinke -"while this document has the date January 25, 1990. So maybe it is GEDCOM 4.2 ?" – GEDCOM-L Archives – May 1994, week 1 (#19)
- ^ a b Subject: GEDCOM (Future Direction) Announced by Family History From: "Jed R. Allen" Date: Fri, 1 May 1998 18:08:24 -0600
- ^ Subject:Re: GEDCOM standards help please From:Graham Starkey – "DRAFT Release 5.2–22 January 1992 120kb" – GEDCOM-L Archives – June 2000, week 4 (#1)
- ^ GEDCOM 5.3 draft Archived 2010-07-22 at the Wayback Machine – 4 November 1993
- ^ THE GEDCOM STANDARD – DRAFT Release 5.4–21 August 1995
- ^ Subject:Timeline of GEDCOM versions and PAF By George Archer – "5.5 11 Dec 1995 (Title Page for 5.5)"- GEDCOM-L Archives – November 2000, week 3 (#12)
- ^ GEDCOM 5.5 Standard Archived 2006-11-20 at the Wayback Machine (Executable file in Envoy format)
- ^ Re: Looking for GEDCOM versions 4 & 5.xx "Brian C. Madsen" – "A GEDCOM 5.5 Errata Sheet dated 10 January 1996 supposedly contains corrections to pages 23, 24, 25, 26, 29, 29, 29, 33, 34, 39, 57, 79, and 85."
- ^ Gedcom Documentation Library Archived June 1, 2016, at the Wayback Machine, Chronoplex Software
- ^ GEDCOM Specification Future Direction (1999-07-07)
- ^ "LISTSERV - GEDCOM-L Archives - LISTSERV.NODAK.EDU". listserv.nodak.edu.
- ^ Comments on the GEDCOM Future Directions document Archived 2016-06-12 at the Wayback Machine Michael H. Kay, 17 May 1998
- ^ Subject:GEDCOM Future Directions – From:John Nairn – Date:Mon, 11 May 1998 13:38:45 -0600 – GEDCOM-L Archives – May 1998, week 2 (#3)
- ^ GEDCOM 5.51 data model in UML format – Software Renovation Corporation
- ^ "Gedcom 5.5.1 – GenWiki". wiki-en.genealogy.net. Archived from the original on 2019-06-04. Retrieved 2009-02-11.
- ^ "THE GEDCOM STANDARD Release 5.5.1" (PDF).
- ^ Subject:Re: GEDCOM History From:STEFANO BOSCOLO – Date:Tue, 20 Feb 2001 19:54:06 +0100 – GEDCOM-L Archives – February 2001, week 3 (#1)
- ^ Subject: Re: GEDCOM History From:"Rafal T. Prinke" – Date:Tue, 20 Feb 2001 22:14:55 +0100 – GEDCOM-L Archives – February 2001, week 3 (#4)
- ^ a b "Draft Specification for GEDCOM XML 6.0" (PDF). FamilySearch. Archived from the original (PDF) on 2006-11-16. Retrieved 2006-11-19.
- ^ "Web site for GEDCOM 7.0". Archived from the original on 2021-02-25. Retrieved 2021-02-22.
- ^ "GEDCOM 7.0 at Louis Kessler's Behold Blog". Archived from the original on 2021-05-11.
- ^ "The FamilySearch GEDCOM Specification Release 7.0.9" (PDF). GitHub.
- ^ "The FamilySearch GEDCOM Specification Release 7.0.9" (PDF). GitHub.
- ^ a b GEDCOM Standard 5.5, pp. 26–27.
- ^ "Why GEDCOM Files are Not in Sync with All Genealogy Programs" (PDF). family-genealogy.com. 2011-04-24. Archived from the original (PDF) on 2016-03-03. Retrieved 2015-03-17.
- ^ "Aldfaer, Hét gratis stamboomprogramma" [Aldfaer, the free family tree program]. www.aldfaer.nl (in Dutch). Netherlands. Retrieved 2022-10-01.
- ^ GEDCOM Standard 5.5, p. 28.
- ^ Draft GEDCOM Standard 5.5.1, p. 6.
- ^ GEDCOM Standard 5.5, p. 45.
- ^ GEDCOM Standard 5.5, p. 27.
- ^ GEDCOM Draft 5.5.1, p. 38
- ^ a b "GEDCOM X". FamilySearch. Retrieved February 4, 2012.
- ^ "Ryan Heaton: A New GEDCOM". The Ancestry Insider. 2012-02-04. Retrieved 2012-02-16.
- ^ "RootsTech Learning #3 – GEDCOM X and/or BetterGEDCOM and/or FHISO". Stardust 'n' Roots. 2012-02-13. Retrieved 2012-02-16.
- ^ 2012-08-31 GEDCOM X: no industry standard, FamilySearch abandons GEDCOM X push, By Tamura Jones, Modern Software Experience.
- ^ "General FAQs". FamilySearch GEDCOM. Retrieved 2022-07-25.
- ^ CommSoft to Return? Dick Eastman Online 3/14/2001 – Archive – Ancestry.com
- ^ RootsWeb: TMG-L [TMG] InterGED/Event GEDCOM Date: Fri, 15 Feb 2002 13:33:18 -0700
- ^ "Commsoft: The Roots Story". Retrieved 2008-11-20.
- ^ "TMGL Archives: Event-oriented". 2000-06-29. Archived from the original on 2007-11-01. Retrieved 2008-11-20.
- ^ "Family History Information Standards Organisation". FHISO. Archived from the original on 2016-04-30. Retrieved 25 April 2016.
- ^ "Annual Report to Members for 2018". Family History Information Standards Organisation. Archived from the original on 2019-02-25. Retrieved 2023-02-04.
External links
[edit]- General
- GEDCOM Standard
- FamilySearch GEDCOM Guide
- GEDCOM X Project
- "More on LDS Church's Adoption of the XML Standard". ancestry.com. Archived from the original on 2011-09-27. Retrieved 20 April 2024.
- THE GEDCOM STANDARD Release 5.5.1, released 15. November 2019
GEDCOM
View on GrokipediaHistory and Development
Origins and Initial Creation
The development of GEDCOM originated within the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) in 1984, as part of broader efforts to computerize family history research and facilitate the exchange of genealogical data among Church members and their software tools.[1] This initiative was deeply motivated by the Church's doctrinal emphasis on temple ordinances, including baptisms and endowments for deceased ancestors, which required accurate tracking and sharing of lineage-linked information to support ordinance reservations and avoid duplication.[1][2] The initial version, known as GEDCOM 1.0, was released in 1984 as a straightforward, human-readable text-based format designed primarily for mainframe computer systems used in the Church's Ancestral File database.[1][3] This format employed line-based records with level indicators and tags to represent hierarchical family structures, enabling the transfer of pedigree and family group data without proprietary software dependencies. Key contributors included members of the LDS Family History Department.[4] Early adoption of GEDCOM was largely confined to LDS-specific applications, such as the Personal Ancestral File (PAF) software, which the Church released in 1984 to empower members in compiling and submitting family data for temple work. PAF integrated GEDCOM export capabilities starting with version 2.0 in 1985, allowing users to submit digital files directly to Church systems for ordinance tracking and integration into centralized databases.[3] This limited scope reflected GEDCOM's initial focus on internal Church needs before broader genealogical community involvement.Standardization and Evolution
The GEDCOM specification emerged from collaborative efforts within the Family History Department of The Church of Jesus Christ of Latter-day Saints, with GEDCOM 4.0 released in August 1989 as a key standardized version, building on earlier drafts to define a uniform format for genealogical data exchange.[4] This release marked a shift toward broader industry adoption, moving beyond its initial creation within the LDS Church to encourage participation from external developers and software producers.[5] Prepared by the Projects and Planning Division under Data Administration, the standard emphasized flexibility and compatibility to support the growing ecosystem of genealogical tools.[4] The evolution of GEDCOM was primarily driven by the imperative for interoperability among diverse software applications, prompting invitations to commercial vendors to register their products and incorporate the Lineage-Linked GEDCOM Form for seamless data sharing.[5] Notable examples include Broderbund's Family Tree Maker and Leister Productions' Reunion, which integrated GEDCOM support to enable users to transfer family history data across platforms without loss of structure.[6] This vendor involvement helped establish GEDCOM as a de facto industry standard, fostering a wide range of interoperable products while maintaining backward compatibility with prior versions.[5] In the post-2010 era, FamilySearch, as the steward of the specification, has played a central role in its ongoing maintenance and enhancement, culminating in the release of GEDCOM 7.0 in 2021, with subsequent minor updates continuing as of 2025 to address modern needs.[7][8] Collaborative development accelerated through initiatives like the RootsTech 2020 effort, involving industry stakeholders to update the standard based on GEDCOM 5.5.1.[7] FamilySearch has further promoted open-source contributions by hosting the specification on a public GitHub repository at gedcom.io, allowing developers to review, suggest improvements, and ensure continued relevance in genealogical research.[7]Data Model
Hierarchical Records and Levels
GEDCOM employs a tree-like hierarchical structure to organize genealogical data, where information is represented as nested records and substructures. This model uses numeric levels to denote parent-child relationships, beginning with level 0 for top-level records that serve as the primary entities in a family tree. Each subsequent level indicates subordination to the nearest preceding line at a lower level, creating a logical nesting that mirrors familial and event-based connections without requiring a relational database schema.[5][9] The core record types at level 0 include Individual (INDI) for personal details, Family (FAM) for marital or parental units, and Source (SOUR) for bibliographic references, among others such as Repository (REPO) and Note (NOTE). Each record initiates with a level 0 line followed by a unique cross-reference identifier (XREF), such as0 @I1@ INDI, which acts as a pointer for linking across the file. Substructures under these records appear at level 1 or higher, encapsulating attributes, events, and multimedia references; for instance, an individual's birth event might nest as 1 BIRT with further details like date at level 2 (2 DATE 15 NOV 1950). This indentation via levels ensures that data like names, occupations, or residences are contextually tied to their parent record.[5][9]
Relationships between records are established through cross-reference pointers rather than duplication, promoting data integrity and efficiency. For example, a Family record links to Individual records via tags like 1 HUSB @I1@ for the husband and 1 CHIL @I2@ for a child, allowing bidirectional navigation without repeating personal details. This pointer system extends to associations, such as an individual's family membership via 1 FAMC @F1@, enabling complex pedigrees while maintaining the hierarchical nesting for intra-record elements like events and notes.[5][9]
Unlike flat-file or tabular databases, GEDCOM's hierarchy emphasizes parent-child nesting to group temporally or thematically related data, such as sequencing life events under an individual or embedding citations within sources. This approach facilitates the representation of irregular, narrative-driven genealogical information, where substructures can vary in depth and cardinality to accommodate diverse family histories.[5][9]
Tags, Values, and Pointers
In GEDCOM, tags serve as three- or four-letter mnemonic codes that identify the type of data element within a line, providing semantic meaning in the hierarchical structure. These tags are always uppercase and typically abbreviated for brevity, such asNAME for a person's name, BIRT for birth event, or DEAT for death.[5] Tags are defined in the specification's appendix, distinguishing between standard tags approved for universal use and user-defined extensions prefixed with an underscore (e.g., _MYTAG), which allow customization without conflicting with core elements.[5] Within records, certain tags are mandatory—such as NAME in an individual (INDI) record—to ensure completeness, while others like SOUR (source citation) are optional but recommended for verifiability.[5] In GEDCOM 7.0, tags are further formalized with URIs for semantic interoperability (e.g., g7:NAME), enhancing machine readability while maintaining backward compatibility with prior versions.[9]
Values follow the tag on each line, separated by a single space, and represent the actual data content associated with that tag. They are text-based strings limited to 255 characters per line in GEDCOM 5.5.1, with longer values extended using continuation tags like CONC (concatenation without newline) or CONT (continuation with newline) to preserve formatting.[5] For example, a name value might appear as John /Doe/, where slashes delimit surname components, or a place as Cove, Cache, Utah, USA.[5] Special characters in values are handled via escape sequences, such as doubling the at sign (@@) to include a literal @, or using @#LANG@ to specify language (e.g., @#ENGLISH@).[5] GEDCOM 7.0 removes the CONC tag and character limits, favoring UTF-8 encoding for unrestricted text handling and multi-line CONT for notes.[9]
Pointers, also known as cross-reference identifiers (XREFs), enable linkages between records using a unique format enclosed in at signs: @<identifier>@, where the identifier is an alphanumeric string up to 22 characters (e.g., @I123@ for an individual).[5] These appear optionally at the start of a line after the level number, such as in 1 CHIL @I123@ to link a child to an individual record, ensuring no duplicates within a file.[5] Pointers are distinct from values by their @...@ delimiters and are used exclusively for referencing, not storing data. In GEDCOM 7.0, pointers support a null value (@VOID@) for optional links and integrate with URI-based tags for extended semantics.[9]
GEDCOM employs specific data types for values to standardize common genealogical elements, parsed line-by-line for efficiency. Dates use a structured format like <calendar> <day> <month> <year>, with escape sequences for calendars (e.g., @#DGREG@ 3 JAN 2000 for Gregorian), supporting ranges (BET 1904 AND 1915) and approximations (ABT 1920).[5] Places are free-form but conventionally hierarchical (e.g., City, County, State, Country), often paired with a FORM tag for jurisdiction details.[5] Notes allow unstructured text for annotations, continued across lines with CONT to embed research context without altering hierarchy.[5] In GEDCOM 7.0, dates incorporate a PHRASE substructure for dual-date handling (e.g., old vs. new style), and all data types align with XML Schema primitives like xsd:string for broader compatibility.[9] This line-based syntax—comprising level, optional pointer, tag, and value—facilitates simple parsing while accommodating the format's emphasis on portability across systems.[5]
File Structure
Header Block
The Header Block is the mandatory initial segment of a GEDCOM file, beginning with the level 0HEAD tag to delineate the start of the transmission and provide essential metadata for parsers to interpret the file correctly.[5] This block declares the GEDCOM version, source software, character encoding, submitter reference, and optional copyright information, ensuring compatibility across genealogical software systems.[5] By specifying these elements, the Header Block allows receiving applications to validate the file format and handle data appropriately before processing the subsequent body records. The header must include a reference to a submitter record in the body via the SUBM tag.[5]
The structure commences with 0 HEAD, followed by required level 1 substructures such as 1 GEDC containing 2 VERS 5.5.1 to indicate the GEDCOM specification version, and 1 CHAR UTF-8 (valid in 5.5.1 and later; ANSEL or ASCII in earlier versions) to define the character set for text rendering.[5] The source is identified via 1 SOUR <APPROVED_SYSTEM_ID>, often accompanied by 2 VERS <VERSION_NUMBER> for the producing software's version, while 1 SUBM @<XREF:SUBM>@ references the submitter record elsewhere in the file using a unique cross-reference identifier.[5] An optional 1 COPR <COPYRIGHT_GEDCOM_FILE> tag includes a copyright notice to protect the dataset.[5] In GEDCOM 7.0, these elements are retained but with UTF-8 as the exclusive encoding and stricter URI recommendations for the SOUR tag to enhance interoperability.[10]
A representative example of a Header Block in GEDCOM 5.5.1 format is:
This setup follows the body block, which contains the core genealogical records, including a submitter record such as0 HEAD 1 SOUR Family Historian 2 VERS 7.0.10 1 GEDC 2 VERS 5.5.1 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 SUBM @S1@ 1 COPR Copyright 2025 by Example User0 HEAD 1 SOUR Family Historian 2 VERS 7.0.10 1 GEDC 2 VERS 5.5.1 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 SUBM @S1@ 1 COPR Copyright 2025 by Example User
0 @S1@ SUBM with details like name.[5]
Common errors in the Header Block include mismatched version declarations between GEDC VERS and the actual file structure, leading to import failures in parsers that enforce strict compliance.[5] Omitting required tags like CHAR or SUBM can also cause data corruption during transmission, as software may default to incompatible encodings or fail to associate the file with a submitter.[10] Proper adherence to these specifications mitigates such issues, promoting reliable exchange of genealogical data.[5]
Body Block
The Body Block constitutes the core data payload of a GEDCOM file, immediately following the Header Block and encapsulating all genealogical records in a hierarchical, line-based format.[5] It comprises a series of logical records, each initiated by a level 0 line such as0 @I1@ INDI for an individual or 0 @F1@ FAM for a family group, with subordinate lines detailing attributes and events.[5] These substructures include event records like 1 BIRT for birth details (potentially nested with 2 DATE for dates or 2 PLAC for places) and attribute records such as 1 SEX M for gender, allowing for multi-level nesting to represent complex relationships and facts.[5] In GEDCOM 7.0, this structure persists with similar leveled lines and substructures, though parsing simplifications like the elimination of line continuations via CONC (replaced by CONT) streamline handling of nested elements.[9]
Records within the Body Block are organized hierarchically by indentation levels (ranging from 0 to 99, without leading zeros), where each level indicates subordination to the preceding line, enabling a tree-like representation of data.[5] While there is no mandated sequence for top-level records across the block—allowing submitters to arrange them by preference—substructures within a given record adhere to a conventional order, such as events preceding attributes.[5] Cross-references facilitate interconnections between records through unique pointers (e.g., @<XREF:INDI>@), which link elements like a family record's children to individual records via 1 FAMC @F1@.[5] This pointer system ensures data cohesion without requiring physical adjacency, supporting bidirectional relationships in the genealogy.[9]
Indexing in the Body Block relies implicitly on these pointers rather than explicit indices, as parsers process the file line-by-line to construct a relational graph from the links.[5] Upon encountering a pointer, compliant software resolves it by scanning for the corresponding record elsewhere in the block, building an in-memory model of entities and their associations.[9] This approach accommodates dynamic data volumes but demands efficient parsing to handle potential forward references.[5]
Due to extensive nesting—particularly in notes (1 NOTE) and source citations (1 SOUR) that can embed further substructures—GEDCOM files in the Body Block phase can expand significantly, often reaching megabytes for large pedigrees.[5] To mitigate memory constraints during processing, GEDCOM 5.5.1 recommends constraining individual logical records to under 32 kilobytes, fitting typical buffers of the era.[5] GEDCOM 7.0 removes such explicit limits on nesting depth or line length (previously capped at 255 characters), permitting greater flexibility at the cost of increased computational demands for deeply nested datasets.[9]
Trailer Block
The Trailer Block serves as the simple closing segment of a GEDCOM file, consisting of a single mandatory line at level 0 formatted as0 TRLR. This tag specifies the end of the GEDCOM transmission, with no associated value or subordinate structures permitted.[5]
Its primary role is to mark the completion of the data transmission, thereby preventing errors from partial file reads by informing parsers that no further content follows.[10] In some multi-disk or segmented transmissions, it appears only on the final segment to confirm overall completeness.[5] Strict parsers treat the absence of the trailer as an indication of an invalid or incomplete file, often triggering processing errors.[10]
Historically, the trailer evolved from simpler termination indicators in early GEDCOM drafts to a standardized, robust endpoint mechanism, ensuring reliable interchange in versions from 4.0 onward.[4] It directly follows the preceding body records to delineate the file's boundary.[5]
Sample File Excerpt
To illustrate the practical structure of a GEDCOM file, consider the following minimal example, which includes a header block, a basic body with one submitter record, one individual record, and one family record, and a trailer block. This example conforms to the GEDCOM 5.5 standard and demonstrates core syntax elements such as levels, tags, pointers, and values.[5]This example can be broken down line by line to highlight key components:0 HEAD 1 SOUR PAF 2 VERS 2.1 1 DATE 15 NOV 1995 1 FILE MYFILE.GED 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 1 SUBM @S1@ 0 @S1@ SUBM 1 NAME Example User 0 @I1@ INDI 1 NAME John /Smith/ 1 [SEX](/page/Sex) M 1 BIRT 2 DATE 12 MAY 1960 0 @F1@ FAM 1 HUSB @I1@ 1 WIFE @I2@ 1 CHIL @I3@ 0 TRLR0 HEAD 1 SOUR PAF 2 VERS 2.1 1 DATE 15 NOV 1995 1 FILE MYFILE.GED 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 1 SUBM @S1@ 0 @S1@ SUBM 1 NAME Example User 0 @I1@ INDI 1 NAME John /Smith/ 1 [SEX](/page/Sex) M 1 BIRT 2 DATE 12 MAY 1960 0 @F1@ FAM 1 HUSB @I1@ 1 WIFE @I2@ 1 CHIL @I3@ 0 TRLR
0 HEAD: Initiates the header block at level 0, marking the start of the file. The level 0 indicates a top-level record.[5]1 SOUR PAF: At level 1 (subordinate to HEAD), this tag identifies the software source ("PAF" for Personal Ancestral File) used to generate the file.[5]2 VERS 2.1: At level 2 (further subordinate), the VERS tag specifies the version of the source software.[5]1 DATE 15 NOV 1995: Level 1 under HEAD records the file creation date in a standardized format.[5]1 FILE MYFILE.GED: Level 1 under HEAD names the transmission file.[5]1 GEDC: Level 1 under HEAD begins the GEDCOM version details.[5]2 VERS 5.5: Level 2 under GEDC specifies the GEDCOM standard version.[5]2 FORM LINEAGE-LINKED: Level 2 under GEDC defines the file form, here the common lineage-linked structure for family trees.[5]1 CHAR ANSEL: Level 1 under HEAD declares the character set (ANSEL, an older encoding; 5.5.1 and later files often use UTF-8).[5]1 SUBM @S1@: Level 1 under HEAD references the submitter record via unique pointer @S1@.[5]0 @S1@ SUBM: Level 0 starts the submitter record with pointer @S1@ and SUBM tag.[5]1 NAME Example User: Level 1 under SUBM provides the submitter's name.[5]0 @I1@ INDI: Level 0 starts the body block with an individual record;@I1@is a unique pointer (xref ID) for referencing, followed by the INDI tag for a person.[5]1 NAME John /Smith/: Level 1 under INDI provides the name, with slashes delimiting surname.[5]1 [SEX](/page/Sex) M: Level 1 under INDI specifies gender (M for male).[5]1 BIRT: Level 1 under INDI introduces a birth event.[5]2 DATE 12 MAY 1960: Level 2 under BIRT gives the event date.[5]0 @F1@ FAM: Level 0 starts a family record;@F1@is its pointer, with FAM tag for family group.[5]1 HUSB @I1@: Level 1 under FAM links the husband via pointer@I1@.[5]1 WIFE @I2@: Level 1 under FAM links the wife (pointer@I2@assumes another INDI record, omitted here for brevity).[5]1 CHIL @I3@: Level 1 under FAM links a child (pointer@I3@assumes another INDI).[5]0 TRLR: Level 0 ends the file, marking the trailer block.[5]
@...@ format, the tag, and the value; subordinate lines use incremented levels to denote hierarchy, while continuation of long values employs the CONT or CONC tags at the next level with a leading space.[5]
Versions
GEDCOM 5.5 and 5.5.1
GEDCOM 5.5, released on January 2, 1996, with errata on January 10, 1996, represented a major update to the standard by adopting the American National Standards Institute (ANSI) ANSEL character set, enabling better handling of diacritical marks and special characters common in international genealogical records.[11] This version introduced refined date formats supporting multiple calendars, including Gregorian, Julian, Hebrew, and French Revolutionary, along with qualifiers such as "about" (ABT), "estimated" (EST), and "calculated" (CALC) for imprecise dates.[5] The place (PLAC) structure was enhanced to include a hierarchical jurisdiction path, specified via a FORM substructure, allowing representations like "Springfield, Sangamon County, Illinois, United States" for greater locational precision.[5] Key innovations in GEDCOM 5.5 included the Association (ASSO) tag, which links individuals through non-familial relationships like friends, neighbors, or witnesses, using a RELA subtag to describe the nature of the association.[5] It also added the Repository (REPO) record for cataloging sources, complete with call numbers and addresses, improving source management and citation traceability.[5] These features built on earlier versions while maintaining backward compatibility, with most implementations able to parse GEDCOM 5.5 files as a baseline for data exchange.[5] GEDCOM 5.5.1, released on November 15, 2019, offered minor corrections and enhancements to address ambiguities in the prior version.[5] It formalized Unicode support, including UTF-8 encoding, to accommodate a broader range of international scripts and reduce reliance on ANSEL.[5] Multimedia integration via Object (OBJE) records was streamlined by eliminating embedded binary data (BLOB) in favor of external file references, with FORM and TYPE substructures specifying formats like JPEG or TIFF for images and audio.[5] Event structures received clarifications, such as refined <<EVENT_DETAIL>> components for attributes like religion (RELI), ensuring more consistent representation of life events.[5] As of 2025, GEDCOM 5.5 and 5.5.1 continue to dominate genealogy software ecosystems due to their stability, extensive vendor support, and seamless interoperability with legacy datasets, serving as the de facto standard for file exchanges despite the availability of newer specifications.[12]GEDCOM 7.0
FamilySearch released GEDCOM 7.0 on May 19, 2021, as the latest major revision of the standard for exchanging genealogical data, aiming to address limitations in earlier versions by incorporating modern data handling practices.[7] The specification has undergone minor updates, with version 7.0.16 issued on March 18, 2025, incorporating patches for improved clarity and implementation guidance without altering core data structures.[13] This version maintains the hierarchical line-based format while introducing semantic enhancements to support more precise and extensible data representation. GEDCOM 7.0 introduces support for structured extensions using URI-defined schemas, enabling JSON-like flexibility for custom data types such as enumerated values and ages, which enhances interoperability across diverse software.[9] It improves semantic data handling, particularly for role-based relationships in events and family structures, allowing explicit definitions of participant roles (e.g., witness, informant) to better capture complex genealogical contexts beyond simple parent-child links.[9] Key innovations include enhanced multimedia embedding through the MULTIMEDIA_RECORD structure and GEDZIP packaging, which bundles external files like images and audio directly with the GEDCOM stream for seamless transfer.[14] The specification supports probabilistic and approximate dates via structures like DatePhrase for expressions of uncertainty (e.g., "about 1850" or ranges with calendars), multiple calendar systems (Gregorian, Julian, Hebrew, French Revolutionary), and period notations, reducing ambiguities in historical records.[9] Place data is augmented with coordinate support using MAP.LATL and MAP.LONG tags for latitude and longitude, facilitating geospatial integration in mapping tools.[9] Internationalization is strengthened by mandating UTF-8 encoding throughout and introducing the LANG tag for language specification, ensuring global compatibility without legacy character set issues.[10] Adoption of GEDCOM 7.0 has been integrated into FamilySearch's core tools for family tree management and export, with growing support in third-party software such as RootsMagic and Family Historian.[15] It includes mechanisms for backward compatibility, allowing import of GEDCOM 5.5 and 5.5.1 files while mapping legacy structures to new semantics, though some breaking changes require validation during conversion.[14] Since the initial 2021 release, updates have focused on patches for validation rules, expanded handling of research notes through versatile NOTE structures, and refined citation schemas in SOURCE records to better accommodate evidence evaluation and multi-source linking.[16] These revisions, tracked via semantic versioning on the official GitHub repository, emphasize stability and developer tools for conformance testing.[8]Release Timeline
The development of GEDCOM began in 1984 when the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) released its first internal version, GEDCOM 1.0, as a proposed standard for exchanging genealogical data within their systems.[15] Subsequent internal iterations, such as version 2.0 in late 1985 and 2.1 in early 1987, were used in software like Personal Ancestral File (PAF) but remained non-public.[3] The first public release occurred on October 9, 1987, with GEDCOM 3.0, which introduced the lineage-linked form for representing family relationships and was made available for broader adoption by genealogical software developers.[3] This was followed by version 4.0 on August 4, 1989, which refined the structure for wider compatibility.[17] Version 5.0 arrived on September 25, 1991, enhancing lineage-linked structures to better handle complex pedigrees.[17] Interim drafts appeared in the early 1990s, including 5.1 in September 1992 and 5.3 in November 1993, which experimented with features like Unicode support and multimedia but were never finalized.[3] The major milestone of version 5.5 was released on January 2, 1996 (with errata on January 10), incorporating structured addresses, additional name parts, and contributions from standards bodies like the National Genealogical Society, though it did not achieve formal ANSI ratification.[11] A minor update, GEDCOM 5.5.1, followed on November 15, 2019, adding support for UTF-8 encoding, email addresses, URLs, and geographic coordinates while maintaining backward compatibility.[11] No official version 6.0 was ever released; a beta draft proposing XML-based storage was circulated in December 2002 for developer feedback but was abandoned in favor of alternative formats like GEDCOM X.[18] After a long hiatus, FamilySearch released GEDCOM 7.0 on May 19, 2021, as the first major update in over two decades, introducing semantic versioning, improved multimedia handling via GEDZIP packaging, and resolutions to prior ambiguities.[11] This version has seen ongoing patches, with the latest being 7.0.16 on March 18, 2025, focusing on refinements and interoperability.[13]| Version | Release Date | Status | Key Notes |
|---|---|---|---|
| 1.0 | 1984 | Internal/Proposed | Initial LDS Church development.[15] |
| 3.0 | 1987-10-09 | Public Standard | First public release; lineage-linked form.[3] |
| 4.0 | 1989-08-04 | Standard | Compatibility refinements.[17] |
| 5.0 | 1991-09-25 | Standard | Enhanced structures.[17] |
| 5.5 | 1996-01-02 | Standard | Address and name improvements (errata 1996-01-10).[11] |
| 5.5.1 | 2019-11-15 | Standard | Encoding and metadata additions.[11] |
| 7.0 | 2021-05-19 | Standard | Semantic versioning; GEDZIP support; latest patch 7.0.16 (2025-03-18).[10][13] |
Key Features
Multimedia Integration
GEDCOM supports the integration of multimedia elements, such as images, audio, and documents, primarily through the OBJE record type, which allows genealogical software to reference or embed media files associated with individuals, families, or events.[5] The OBJE record is defined at level 0 as0 @O1@ OBJE, serving as a container for media details without storing the actual file data in earlier versions.[5] Subordinate tags within the OBJE record include 1 FILE photo.jpg to specify the file path or reference, followed by 2 FORM JPG to indicate the media format, ensuring compatibility across systems.[5]
Linking multimedia to core records occurs via a pointer tag, such as 1 OBJE @O1@ under an individual's (INDI) or family's (FAM) event structure, enabling direct association without duplicating file information.[5] In GEDCOM 5.5, optional embedding via binary large objects (BLOB) was supported, but this was deprecated in 5.5.1 and later versions, limiting integration to external file references to maintain file portability and simplicity.[5] Additional metadata, such as descriptive notes via 1 NOTE This is a [family](/page/Family) [portrait](/page/Portrait) from 1950, can accompany the OBJE to provide context like captions.[5]
GEDCOM 7.0 maintains external file references for multimedia but introduces GEDZIP, a ZIP archive format with .gdz extension, to bundle the GEDCOM file and associated media files using local paths (e.g., media/filename), enabling self-contained transmission particularly useful for web-based applications.[9] This version also expands metadata options, including NOTE for detailed captions and CROP subtags under MULTIMEDIA_LINK (e.g., 1 CROP 2 TOP 10 2 LEFT 20 2 HEIGHT 100 2 WIDTH 150) to specify image coordinates for cropping or zooming.[9] Legacy limitations persist in older implementations, where only references are supported, potentially complicating data transfer if files are not bundled separately.[9]
Common use cases include attaching photographs to family (FAM) records to visualize group portraits or events, and linking audio files to individual (INDI) records for oral histories, such as digitized recordings of personal narratives.[19] For instance, a sound bite of an ancestor's story can be referenced alongside a scanned photo, enriching the genealogical context without altering the core text-based structure.[20]
