Hubbry Logo
Open search
logo
Open search
GEDCOM
Community hub

GEDCOM

logo
0 subscribers
Read side by side
from Wikipedia
GEDCOM
Filename extension
.ged
Internet media type
application/vnd.familysearch.gedcom [1] application/vnd.familysearch.gedcom+zip [1]
Developed byLDS FHD
Initial release1984; 41 years ago (1984)
Latest release
7.0.16
18 March 2025; 7 months ago (2025-03-18)[2]
Type of formatGenealogy data exchange
StandardDe facto[3]
Open format?yes
Websitegedcom.io
github.com/familysearch/GEDCOM

FamilySearch GEDCOM, or simply GEDCOM (/ˈɛdkɒm/ JED-kom, acronym of Genealogical Data Communication), is an open file format and the de facto standard specification for storing genealogical data.[3] It was developed by the Church of Jesus Christ of Latter-day Saints (LDS Church), the operators of FamilySearch, to aid in the research and sharing of genealogical information.[4] A common usage is as a standard format for the backup and transfer of family tree data between different genealogy software and websites, most of which support importing from and exporting to GEDCOM format.[5]

GEDCOM is defined as a plain text file, using UTF-8 encoding as of version 7.0. This file contains genealogical information about individuals such as names, events, and relationships; metadata links these records together.

GEDCOM 7.0, released in 2021, is the most recent version of the GEDCOM specification as of July 2024.[6] However, its predecessor, GEDCOM 5.5.1, remains the industry's format standard for the exchange of genealogical data.[citation needed] First released as a draft standard in 1999, GEDCOM 5.5.1 received only minor updates in the subsequent 20 years leading up to the release of 5.5.1 final in 2019. To address its shortcomings, some genealogy programs introduced proprietary extensions to GEDCOM which are not always recognized by other programs, such as GEDCOM 5.5 EL (Extended Locations).[7][8][9] Efforts have been made to have 7.0 more widely adopted since its release. FamilySearch intends to be GEDCOM 7.0 compatible in the third quarter 2022 and Ancestry.com is planning for 7.0 compatibility, but has not yet specified an implementation date.[citation needed]

Data model

[edit]

GEDCOM uses a lineage-linked data model based on the conceptual model of the nuclear family. The family (FAM) record type is therefore the only source of links between the individuals (INDI) in the file, assigning parents (as HUSB and WIFE) and children (as CHIL) by referring to individuals' unique ID numbers.[10] These historical origins are described in the 7.0 specification document: "The FAM record was originally structured to represent families where a male HUSB (husband or father) and female WIFE (wife or mother) produce CHIL (children)."[11]

Although the links in a GEDCOM family record still use the original naming indicating a husband and a wife, the specification now states that "sex, gender, titles, and roles of partners should not be inferred based on the partner that the HUSB or WIFE structure points to" and that these individuals within a family structure are collectively referred to as 'partners', 'parents' or 'spouses'. A FAM record can also be used for "cohabitation, fostering, adoption, and so on, regardless of the gender of the partners."[11]

File structure

[edit]

A GEDCOM file consists of a header section, records, and a trailer section. Within these sections, records represent people (INDI record), families (FAM records), sources of information (SOUR records), and other miscellaneous records, including notes. Every line of a GEDCOM file begins with a level number where all top-level records (HEAD, TRLR, SUBN, and each INDI, FAM, OBJE, NOTE, REPO, SOUR, and SUBM) begin with a line with level 0, while other level numbers are positive integers.

Although it is possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator[12] that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView project, though it is not meant to be a standalone validator. For standalone validation "The Windows GEDCOM Validator" can be used.[13] or the older unmaintained Gedcheck[14] from the LDS Church.

During 2001, The GEDCOM TestBook Project evaluated how well four popular genealogy programs conformed to the GEDCOM 5.5 standard using the Gedcheck program.[15] Findings showed that a number of problems existed and that "The most commonly found fault leading to data loss was the failure to read the NOTE tag at all the possible levels at which it may appear."[16] In 2005, the Genealogical Software Report Card was evaluated (by Bill Mumford who participated in the original GEDCOM Testbook Project)[17] and included testing the GEDCOM 5.5 standard using the Gedcheck program.[18]

To assist with adoption of GEDCOM 7.0, validation tools now exist for that standard as well.[19]

Example

[edit]

The following is a sample GEDCOM file.

The header (HEAD) includes the source program and version (Personal Ancestral File, 5.0), the GEDCOM version (5.5), the character encoding (ANSEL), and a link to information about the submitter of the file.

Key Information

The individual records (INDI) define John Smith (ID I1), Elizabeth Stansfield (ID I2), and James Smith (ID I3).

The family record (FAM) links the husband (HUSB), wife (WIFE), and child (CHIL) by their ID numbers.

Versions

[edit]

The current version of the specification in wide use is GEDCOM 5.5.1 final, which was released on 15 November 2019. Its predecessor, GEDCOM 5.5.1 draft[20] was issued in 1999, introducing nine new attribute, tags and adding UTF-8 as an approved character encoding. The draft was not formally approved, but its provisions were adopted in some part by a number of genealogy programs[21][22][23] including FamilySearch.org.[20]

Lineage-linked GEDCOM is the deliberate de facto common denominator.[3] Despite version 5.5 of the GEDCOM standard first being published in 1996, many genealogical software suppliers have never fully supported the feature of multilingual Unicode text (instead of the ANSEL character set) introduced with that version of the specification. Uniform use of Unicode would allow for the usage of international character sets. An example is the storage of East Asian names in their original Chinese, Japanese and Korean (CJK) characters, without which they could be ambiguous and of little use for genealogical or historical research.[24] PAF 5.2 is an example of software that uses UTF-8 as its internal character set, and can output a UTF-8 GEDCOM.[24][25]

GEDCOM 7.0 requires UTF-8 encoding throughout,[26] and resolves other long-standing issues with GEDCOM 5.5.1. Multimedia support in the form of an associated .zip file, called a GEDZip, is another inclusion. Efforts are underway to see 7.0 embraced as the new exchange standard.[27] GEDCOM 7.0 allows explicitly identifying what standards other than GEDCOM may apply to a particular file. GEDCOM has always been extensible, but prior to 7.0 there was no standard way to identify such extensions. Also, GEDCOM 7.0 allows explicitly marking an event as nonexistent. This allows, for example, documenting that a particular individual never married.[28] GEDCOM 7.0 was the first version to use semantic versioning, and is the most recent minor version of the specification.

As of July 2024, the next planned minor release is v7.1, which is under development.[29]

Release history

[edit]
GEDCOM version Release date Notes
Unsupported: 1.0[30] 1984[31]
Unsupported: 2.0[30] Dec 1985[32] PAF 2.0
Unsupported: 2.1 Feb 1987[32] GEDCOM for PAF 2.1
Unsupported: 2.3 Draft 7 August 1985[33] with PAF2.0 GEDCOM implementation conventions
Unsupported: 2.4 Draft 13 December 1985[33] with PAF2.0 GEDCOM implementation conventions
Unsupported: 3.0 Standard[30] 9 October 1987[34] PAF 2.0 and 2.1 implementation of 3.0
Unsupported: 4.0 Standard August 1989 PAF 2.1 – 2.31
Unsupported: 4.1 Draft[35]
Unsupported: 4.2 Draft[36] 25 January 1990[37]
Unsupported: 5.0 Draft[30] 31 December 1991[33] lineage-linked structures were introduced.[38]
Unsupported: 5.1 Draft 18 September 1992[32]
Unsupported: 5.2 Draft 22 January 1992[39]
Unsupported: 5.3 Draft 4 November 1993[40] Unicode standard (ISO/IEC 10646) was introduced as an additional character set.
Unsupported: 5.4 Draft 21 August 1995[41]
Unsupported: 5.5 Standard 11 December 1995[42] PAF 3, 4 and 5
Supported: 5.5 Standard January 2, 1996[43][44] PAF 3, 4 and 5 / 5.5 Standard[45]
Unsupported: GEDCOM (Future Direction) Draft[38][46] May 1, 1998[47][48] "it used an entirely new data model"[49]
Unsupported: 5.5.1 Draft[50][51] October 2, 1999[20] Used by FamilySearch.org[20] UTF-8 added as an approved character encoding.
Supported: 5.5.1 Release[52] November 15, 2019 current standard, minor text modifications to 5.5.1 Draft.
Unsupported: 5.6 Private Draft -[53] "Jed Allen sent those two files to a few people only for sort of "private comments"[54]
Unsupported: 6.0 XML Draft December 28, 2001[55] Was not a complete specification, and not recommended to begin to software implementations.
Unsupported: 7.0.0-rc1 Draft February 2021[56] Release candidate revealed for RootsTech 2021, but then all talks, specifications and the web site were removed on 25 February 2021[57]
Unsupported: 7.0[58] 27 May 2021 Modernize character encoding, clarify ambiguities in 5.5.1 specification, introduce semantic versioning, improve multimedia handling
Latest version: 7.0.13[59] 4 August 2023
Legend:
Unsupported
Supported
Latest version

Limitations

[edit]

Support for multi-person events and sources

[edit]

A GEDCOM file can contain information on events such as births, deaths, census records, ship's records, marriages, etc.; a rule of thumb is that an event is something that took place at a specific time, at a specific place (even if time and place are not known). GEDCOM files can also contain attributes such as physical description, occupation, and total number of children; unlike events, attributes generally cannot be associated with a specific time or place.

The GEDCOM specification requires that each event or attribute is associated with exactly one individual or family.[60] This causes redundancy for events such as census records where the actual census entry often contains information on multiple individuals. In the GEDCOM file, for census records a separate census "CENS" event must be added for each individual referenced. Some genealogy programs, such as Gramps and The Master Genealogist, have elaborate database structures for sources that are used, among other things, to represent multi-person events. When databases are exported from one of these programs to GEDCOM, these database structures cannot be represented in GEDCOM due to this limitation, with the result that the event or source information including all of the relevant citation reference information must be duplicated each place that it is used. This duplication makes it difficult for the user to maintain the information related to sources.

In the GEDCOM specification, events that are associated with a family such as marriage information is only stored in a GEDCOM once, as part of the family (FAM) record, and then both spouses are linked to that single family record.[60]

Ambiguity in the specification

[edit]

The GEDCOM specification was made purposefully flexible to support many ways of encoding data, particularly in the area of sources. This flexibility has led to a great deal of ambiguity, and has produced the side effect that some genealogy programs which import GEDCOM do not import all of the data from a file.[61]

Ordering of events that do not have dates

[edit]

The GEDCOM specification does not offer explicit support for keeping a known order of events. In particular, the order of relationships (FAMS) for a person and the order of the children within a relationship (FAM) can be lost. In many cases the sequence of events can be derived from the associated dates. But dates are not always known, in particular when dealing with data from centuries ago. For example, in the case that a person has had two relationships, both with unknown dates, but from descriptions it is known that the second one is indeed the second one. The order in which these FAMS are recorded in GEDCOM's INDI record will depend on the exporting program. In Aldfaer[62] for instance, the sequence depends on the ordering of the data by the user (alphabetical, chronological, reference, etc.). The proposed XML GEDCOM standard[55] does not address this issue either.

Lesser-known features

[edit]

GEDCOM has many features that are not commonly used. Some software packages do not support all the features that the GEDCOM standard allows.

Multimedia

[edit]

The GEDCOM standard supports the inclusion of multimedia objects (for example, photos of individuals).[63] Such multimedia objects can be either included in the GEDCOM file itself (called the "embedded form") or in an external file where the name of the external file is specified in the GEDCOM file (called the "linked form"). Embedding multimedia directly in the GEDCOM file makes transmission of data easier, in that all of the information (including the multimedia data) is in one file, but the resulting file can be enormous. Linking multimedia keeps the size of the GEDCOM file under control, but then when transmitting the file, the multimedia objects must either be transmitted separately or archived together with the GEDCOM into one larger file. Support for embedding media directly was dropped in the draft 5.5.1 standard.[64]

Conflicting information

[edit]

The GEDCOM standard allows for the specification of multiple opinions or conflicting data, simply by specifying multiple records of the same type. For example, if an individual's birth date was recorded as 10 January 1800 on the birth certificate, but 11 January 1800 on the death certificate, two BIRT records for that individual would be included, the first with the 10 January 1800 date and giving the birth certificate as the source, and the second with the 11 January 1800 date and giving the death certificate as the source. The preferred record is usually listed first.

This example encoded in GEDCOM might look like this:

0 @I1@ INDI
1 NAME John /Doe/
1 BIRT
2 DATE 10 JAN 1800
2 SOUR @S1@
3 DATA
4 TEXT Transcription from birth certificate would go here
3 NOTE This birth record is preferred because it comes from the birth certificate
3 QUAY 2
1 BIRT
2 DATE 11 JAN 1800
2 SOUR @S2@
3 DATA
4 TEXT Transcription from death certificate would go here
3 QUAY 2

Conflicting data may also be the result of user errors. The standard does not specify in any way that the contents must be consistent. A birth date like "10 APR 1819" might mistakenly have been recorded as "10 APR 1918" long after the person's death. The only way to reveal such inconsistencies is by rigorous validation of the content data.

Internationalization

[edit]

The GEDCOM standard supports internationalization in several ways. First, newer versions of the standard allow data to be stored in Unicode (or, more recently, UTF-8), so text in any language can be stored.[65] Secondly, in the same way that one can have multiple events on a person, GEDCOM allows one to have multiple names for a person,[66] so names can be stored in multiple languages, although there is no standardized way to indicate which instance is in which language. Finally, in version 5.5.1, the NAME field also supports a phonetic variation (FONE) and a romanized variation (ROMN) of the name.[67]

GEDCOM X

[edit]

In February 2012 at the RootsTech 2012 conference, FamilySearch outlined a major new project around genealogical standards called GEDCOM X, and invited collaboration.[68] It includes software developed under the Apache open source license. It includes data formats that facilitate basing family trees on sources and records (both physical artifacts and digital artifacts), support for sharing and linking data online, and an API.[68][69][70]

In August 2012 FamilySearch employee and GEDCOM X project leader Ryan Heaton dropped the claim that GEDCOM X is the new industry standard, and repositioned GEDCOM X as another FamilySearch open source project.[71]

After the release of GEDCOM 7, FamilySearch positioned GEDCOM X as useful for interoperation with its FamilySearch Family Tree software.[72]

Alternatives

[edit]

Commsoft, the authors of the Roots[73] series of genealogy software and Ultimate Family Tree, defined a version called Event-Oriented GEDCOM (also known as "Event GEDCOM" and originally called InterGED[74]),[75] which included events as first class (zero-level) items. Although it is event based, it is still a model built on assumed reality rather than evidence. Event GEDCOM was more flexible, as it allowed some separation between believed events and the participants. However, Event GEDCOM was not widely adopted by other developers due to its semantic differences.[citation needed] With Roots and Ultimate Family Tree no longer available, very few people today are using Event GEDCOM.[76]

Gramps XML is an XML-based open format created by the open source genealogy project Gramps and used also by PhpGedView.

The Family History Information Standards Organisation was established in 2012 with the aim of developing international standards for family history and genealogical information.[77] One of the standards the organization proposed was Extended Legacy Format (ELF), compatible with GEDCOM 5.5(.1), but including an extensibility mechanism. The organization requested public comment on the proposed standard in 2017.[78] It withdrew the proposal because release 7.0 of GEDCOM addressed many of the organization's concerns.[28]

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia

History and Development

Origins and Initial Creation

The development of GEDCOM originated within the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) in 1984, as part of broader efforts to computerize family history research and facilitate the exchange of genealogical data among Church members and their software tools. This initiative was deeply motivated by the Church's doctrinal emphasis on temple ordinances, including baptisms and endowments for deceased ancestors, which required accurate tracking and sharing of lineage-linked information to support ordinance reservations and avoid duplication. The initial version, known as GEDCOM 1.0, was released in as a straightforward, human-readable text-based format designed primarily for systems used in the Church's Ancestral File database. This format employed line-based with level indicators and tags to represent hierarchical structures, enabling the transfer of pedigree and group data without dependencies. Key contributors included members of the LDS Family History Department. Early adoption of GEDCOM was largely confined to LDS-specific applications, such as the Personal Ancestral File (PAF) software, which the Church released in to empower members in compiling and submitting family data for temple work. PAF integrated GEDCOM export capabilities starting with in 1985, allowing users to submit digital files directly to Church systems for ordinance tracking and integration into centralized databases. This limited scope reflected GEDCOM's initial focus on internal Church needs before broader genealogical community involvement.

Standardization and Evolution

The GEDCOM specification emerged from collaborative efforts within the Family History Department of The Church of Jesus Christ of Latter-day Saints, with GEDCOM 4.0 released in August 1989 as a key standardized version, building on earlier drafts to define a uniform format for genealogical data exchange. This release marked a shift toward broader industry adoption, moving beyond its initial creation within the LDS Church to encourage participation from external developers and software producers. Prepared by the Projects and Planning Division under Data Administration, the standard emphasized flexibility and compatibility to support the growing ecosystem of genealogical tools. The evolution of GEDCOM was primarily driven by the imperative for among diverse software applications, prompting invitations to commercial vendors to register their products and incorporate the Lineage-Linked GEDCOM Form for seamless data sharing. Notable examples include Broderbund's and Leister Productions' Reunion, which integrated GEDCOM support to enable users to transfer family history data across platforms without loss of structure. This vendor involvement helped establish GEDCOM as a industry standard, fostering a wide range of interoperable products while maintaining with prior versions. In the post-2010 era, , as the steward of the specification, has played a central role in its ongoing maintenance and enhancement, culminating in the release of GEDCOM 7.0 in 2021, with subsequent minor updates continuing as of 2025 to address modern needs. Collaborative development accelerated through initiatives like the RootsTech 2020 effort, involving industry stakeholders to update the standard based on GEDCOM 5.5.1. has further promoted open-source contributions by hosting the specification on a public repository at gedcom.io, allowing developers to review, suggest improvements, and ensure continued relevance in genealogical research.

Data Model

Hierarchical Records and Levels

GEDCOM employs a tree-like hierarchical structure to organize genealogical data, where information is represented as nested and substructures. This model uses numeric levels to denote parent-child relationships, beginning with level 0 for top-level that serve as the primary entities in a . Each subsequent level indicates subordination to the nearest preceding line at a lower level, creating a logical nesting that mirrors familial and event-based connections without requiring a schema. The core record types at level 0 include (INDI) for personal details, (FAM) for marital or parental units, and Source (SOUR) for bibliographic references, among others such as Repository (REPO) and Note (NOTE). Each record initiates with a level 0 line followed by a unique identifier (XREF), such as 0 @I1@ INDI, which acts as a pointer for linking across the file. Substructures under these records appear at level 1 or higher, encapsulating attributes, events, and multimedia references; for instance, an individual's birth event might nest as 1 BIRT with further details like date at level 2 (2 DATE 15 NOV 1950). This indentation via levels ensures that data like names, occupations, or residences are contextually tied to their parent record. Relationships between records are established through pointers rather than duplication, promoting and efficiency. For example, a record links to records via tags like 1 HUSB @I1@ for the and 1 CHIL @I2@ for a , allowing bidirectional without repeating personal details. This pointer system extends to associations, such as an individual's family membership via 1 FAMC @F1@, enabling complex pedigrees while maintaining the hierarchical nesting for intra-record elements like events and . Unlike flat-file or tabular databases, GEDCOM's emphasizes parent-child nesting to group temporally or thematically related , such as sequencing life events under an individual or embedding citations within sources. This approach facilitates the representation of irregular, narrative-driven genealogical information, where substructures can vary in depth and to accommodate diverse family histories.

Tags, Values, and Pointers

In GEDCOM, tags serve as three- or four-letter mnemonic codes that identify the type of data element within a line, providing semantic meaning in the hierarchical structure. These tags are always uppercase and typically abbreviated for brevity, such as NAME for a person's name, BIRT for birth event, or DEAT for death. Tags are defined in the specification's appendix, distinguishing between standard tags approved for universal use and user-defined extensions prefixed with an underscore (e.g., _MYTAG), which allow customization without conflicting with core elements. Within records, certain tags are mandatory—such as NAME in an individual (INDI) record—to ensure completeness, while others like SOUR (source citation) are optional but recommended for verifiability. In GEDCOM 7.0, tags are further formalized with URIs for semantic interoperability (e.g., g7:NAME), enhancing machine readability while maintaining backward compatibility with prior versions. Values follow the tag on each line, separated by a single space, and represent the actual data content associated with that tag. They are text-based strings limited to 255 characters per line in GEDCOM 5.5.1, with longer values extended using continuation tags like CONC (concatenation without newline) or CONT (continuation with newline) to preserve formatting. For example, a name value might appear as John /Doe/, where slashes delimit surname components, or a place as Cove, Cache, Utah, USA. Special characters in values are handled via escape sequences, such as doubling the at sign (@@) to include a literal @, or using @#LANG@ to specify language (e.g., @#ENGLISH@). GEDCOM 7.0 removes the CONC tag and character limits, favoring UTF-8 encoding for unrestricted text handling and multi-line CONT for notes. Pointers, also known as cross-reference identifiers (XREFs), enable linkages between records using a unique format enclosed in at signs: @<identifier>@, where the identifier is an alphanumeric string up to 22 characters (e.g., @I123@ for an ). These appear optionally at the start of a line after the level number, such as in 1 CHIL @I123@ to link a child to an record, ensuring no duplicates within a file. Pointers are distinct from values by their @...@ delimiters and are used exclusively for referencing, not storing data. In GEDCOM 7.0, pointers support a null value (@VOID@) for optional links and integrate with URI-based tags for extended semantics. GEDCOM employs specific data types for values to standardize common genealogical elements, parsed line-by-line for efficiency. Dates use a structured format like <calendar> <day> <month> <year>, with escape sequences for calendars (e.g., @#DGREG@ 3 JAN 2000 for Gregorian), supporting ranges (BET 1904 AND 1915) and approximations (ABT 1920). Places are free-form but conventionally hierarchical (e.g., City, County, State, Country), often paired with a FORM tag for details. Notes allow unstructured text for annotations, continued across lines with CONT to embed research context without altering hierarchy. In GEDCOM 7.0, dates incorporate a PHRASE substructure for dual-date handling (e.g., old vs. new style), and all data types align with primitives like xsd:string for broader compatibility. This line-based syntax—comprising level, optional pointer, tag, and value—facilitates simple parsing while accommodating the format's emphasis on portability across systems.

File Structure

Header Block

The Header Block is the mandatory initial segment of a GEDCOM file, beginning with the level 0 HEAD tag to delineate the start of the transmission and provide essential metadata for parsers to interpret the file correctly. This block declares the GEDCOM version, source software, , submitter , and optional copyright information, ensuring compatibility across genealogical software systems. By specifying these elements, the Header Block allows receiving applications to validate the and handle data appropriately before processing the subsequent body records. The header must include a to a submitter record in the body via the SUBM tag. The structure commences with 0 HEAD, followed by required level 1 substructures such as 1 GEDC containing 2 VERS 5.5.1 to indicate the GEDCOM specification version, and 1 CHAR UTF-8 (valid in 5.5.1 and later; ANSEL or ASCII in earlier versions) to define the character set for text rendering. The source is identified via 1 SOUR <APPROVED_SYSTEM_ID>, often accompanied by 2 VERS <VERSION_NUMBER> for the producing software's version, while 1 SUBM @<XREF:SUBM>@ references the submitter record elsewhere in the file using a unique cross-reference identifier. An optional 1 COPR <COPYRIGHT_GEDCOM_FILE> tag includes a copyright notice to protect the dataset. In GEDCOM 7.0, these elements are retained but with UTF-8 as the exclusive encoding and stricter URI recommendations for the SOUR tag to enhance interoperability. A representative example of a Header Block in GEDCOM 5.5.1 format is:
0 HEAD
1 SOUR Family Historian
2 VERS 7.0.10
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 CHAR UTF-8
1 SUBM @S1@
1 COPR Copyright 2025 by Example User
This setup follows the body block, which contains the core genealogical records, including a submitter record such as 0 @S1@ SUBM with details like name. Common errors in the Header Block include mismatched version declarations between GEDC VERS and the actual file structure, leading to import failures in parsers that enforce strict compliance. Omitting required tags like CHAR or SUBM can also cause data corruption during transmission, as software may default to incompatible encodings or fail to associate the file with a submitter. Proper adherence to these specifications mitigates such issues, promoting reliable exchange of genealogical data.

Body Block

The Body Block constitutes the core data payload of a GEDCOM file, immediately following the Header Block and encapsulating all genealogical in a hierarchical, line-based format. It comprises a series of logical , each initiated by a level 0 line such as 0 @I1@ INDI for an or 0 @F1@ FAM for a group, with subordinate lines detailing attributes and events. These substructures include event like 1 BIRT for birth details (potentially nested with 2 DATE for dates or 2 PLAC for places) and attribute such as 1 SEX M for gender, allowing for multi-level nesting to represent complex relationships and facts. In GEDCOM 7.0, this structure persists with similar leveled lines and substructures, though parsing simplifications like the elimination of line continuations via CONC (replaced by CONT) streamline handling of nested elements. Records within the Body Block are organized hierarchically by indentation levels (ranging from 0 to 99, without leading zeros), where each level indicates subordination to the preceding line, enabling a tree-like representation of data. While there is no mandated sequence for top-level records across the block—allowing submitters to arrange them by preference—substructures within a given record adhere to a conventional order, such as events preceding attributes. Cross-references facilitate interconnections between records through unique pointers (e.g., @<XREF:INDI>@), which link elements like a family record's children to individual records via 1 FAMC @F1@. This pointer system ensures data cohesion without requiring physical adjacency, supporting bidirectional relationships in the genealogy. Indexing in the Body Block relies implicitly on these pointers rather than explicit indices, as parsers process the file line-by-line to construct a relational graph from the links. Upon encountering a pointer, compliant software resolves it by scanning for the corresponding record elsewhere in the block, building an in-memory model of entities and their associations. This approach accommodates dynamic data volumes but demands efficient to handle potential forward references. Due to extensive nesting—particularly in notes (1 NOTE) and source citations (1 SOUR) that can embed further substructures—GEDCOM files in the Body Block phase can expand significantly, often reaching megabytes for large pedigrees. To mitigate memory constraints during , GEDCOM 5.5.1 recommends constraining logical to under 32 kilobytes, fitting typical buffers of the . GEDCOM 7.0 removes such explicit limits on nesting depth or line length (previously capped at 255 characters), permitting greater flexibility at the cost of increased computational demands for deeply nested datasets.

Trailer Block

The Trailer Block serves as the simple closing segment of a GEDCOM file, consisting of a single mandatory line at level 0 formatted as 0 TRLR. This tag specifies the end of the GEDCOM transmission, with no associated value or subordinate structures permitted. Its primary role is to mark the completion of the data transmission, thereby preventing errors from partial file reads by informing parsers that no further content follows. In some multi-disk or segmented transmissions, it appears only on the final segment to confirm overall completeness. Strict parsers treat the absence of the trailer as an indication of an invalid or incomplete file, often triggering processing errors. Historically, the trailer evolved from simpler termination indicators in early GEDCOM drafts to a standardized, robust endpoint mechanism, ensuring reliable interchange in from 4.0 onward. It directly follows the preceding body records to delineate the file's boundary.

Sample File Excerpt

To illustrate the practical structure of a GEDCOM file, consider the following minimal example, which includes a header block, a basic body with one submitter record, one individual record, and one family record, and a trailer block. This example conforms to the GEDCOM 5.5 standard and demonstrates core syntax elements such as levels, tags, pointers, and values.
0 HEAD
1 SOUR PAF
2 VERS 2.1
1 DATE 15 NOV 1995
1 FILE MYFILE.GED
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 CHAR ANSEL
1 SUBM @S1@
0 @S1@ SUBM
1 NAME Example User
0 @I1@ INDI
1 NAME John /Smith/
1 [SEX](/page/Sex) M
1 BIRT
2 DATE 12 MAY 1960
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 CHIL @I3@
0 TRLR
This example can be broken down line by line to highlight key components:
  • 0 HEAD: Initiates the header block at level 0, marking the start of the file. The level 0 indicates a top-level record.
  • 1 SOUR PAF: At level 1 (subordinate to HEAD), this tag identifies the software source ("PAF" for Personal Ancestral File) used to generate the file.
  • 2 VERS 2.1: At level 2 (further subordinate), the VERS tag specifies the version of the source software.
  • 1 DATE 15 NOV 1995: Level 1 under HEAD records the file creation date in a standardized format.
  • 1 FILE MYFILE.GED: Level 1 under HEAD names the transmission file.
  • 1 GEDC: Level 1 under HEAD begins the GEDCOM version details.
  • 2 VERS 5.5: Level 2 under GEDC specifies the GEDCOM standard version.
  • 2 FORM LINEAGE-LINKED: Level 2 under GEDC defines the file form, here the common lineage-linked structure for trees.
  • 1 CHAR ANSEL: Level 1 under HEAD declares the character set (ANSEL, an older encoding; 5.5.1 and later files often use ).
  • 1 SUBM @S1@: Level 1 under HEAD references the submitter record via unique pointer @S1@.
  • 0 @S1@ SUBM: Level 0 starts the submitter record with pointer @S1@ and SUBM tag.
  • 1 NAME Example User: Level 1 under SUBM provides the submitter's name.
  • 0 @I1@ INDI: Level 0 starts the body block with an individual record; @I1@ is a unique pointer (xref ID) for referencing, followed by the INDI tag for a .
  • 1 NAME John /Smith/: Level 1 under INDI provides the name, with slashes delimiting surname.
  • 1 [SEX](/page/Sex) M: Level 1 under INDI specifies (M for male).
  • 1 BIRT: Level 1 under INDI introduces a birth event.
  • 2 DATE 12 MAY 1960: Level 2 under BIRT gives the event date.
  • 0 @F1@ FAM: Level 0 starts a record; @F1@ is its pointer, with FAM tag for group.
  • 1 HUSB @I1@: Level 1 under FAM links the husband via pointer @I1@.
  • 1 WIFE @I2@: Level 1 under FAM links the wife (pointer @I2@ assumes another INDI record, omitted here for brevity).
  • 1 CHIL @I3@: Level 1 under FAM links a child (pointer @I3@ assumes another INDI).
  • 0 TRLR: Level 0 ends the file, marking the trailer block.
When parsing GEDCOM files, note that each line must not exceed 255 characters, including the level, tags, and value, to ensure compatibility across systems. Whitespace rules are strict: lines begin immediately with the level number (no leading spaces), followed by an optional xref ID in @...@ format, the tag, and the value; subordinate lines use incremented levels to denote , while continuation of long values employs the CONT or CONC tags at the next level with a leading space.

Versions

GEDCOM 5.5 and 5.5.1

GEDCOM 5.5, released on January 2, 1996, with errata on January 10, 1996, represented a major update to the standard by adopting the (ANSI) ANSEL character set, enabling better handling of diacritical marks and special characters common in international genealogical records. This version introduced refined date formats supporting multiple calendars, including Gregorian, Julian, Hebrew, and French Revolutionary, along with qualifiers such as "about" (ABT), "estimated" (EST), and "calculated" (CALC) for imprecise dates. The place (PLAC) structure was enhanced to include a hierarchical jurisdiction path, specified via a FORM substructure, allowing representations like "Springfield, , " for greater locational precision. Key innovations in GEDCOM 5.5 included the (ASSO) tag, which links individuals through non-familial relationships like friends, neighbors, or witnesses, using a subtag to describe the nature of the association. It also added the Repository (REPO) record for cataloging sources, complete with call numbers and addresses, improving source and citation . These features built on earlier versions while maintaining , with most implementations able to parse GEDCOM 5.5 files as a baseline for data exchange. GEDCOM 5.5.1, released on November 15, 2019, offered minor corrections and enhancements to address ambiguities in the prior version. It formalized support, including encoding, to accommodate a broader range of international scripts and reduce reliance on ANSEL. Multimedia integration via Object (OBJE) records was streamlined by eliminating embedded () in favor of external file references, with FORM and TYPE substructures specifying formats like or for images and audio. Event structures received clarifications, such as refined <<EVENT_DETAIL>> components for attributes like (RELI), ensuring more consistent representation of life events. As of 2025, GEDCOM 5.5 and 5.5.1 continue to dominate genealogy software ecosystems due to their stability, extensive vendor support, and seamless interoperability with legacy datasets, serving as the de facto standard for file exchanges despite the availability of newer specifications.

GEDCOM 7.0

FamilySearch released GEDCOM 7.0 on May 19, 2021, as the latest major revision of the standard for exchanging genealogical data, aiming to address limitations in earlier versions by incorporating modern data handling practices. The specification has undergone minor updates, with version 7.0.16 issued on March 18, 2025, incorporating patches for improved clarity and implementation guidance without altering core data structures. This version maintains the hierarchical line-based format while introducing semantic enhancements to support more precise and extensible data representation. GEDCOM 7.0 introduces support for structured extensions using URI-defined schemas, enabling JSON-like flexibility for custom data types such as enumerated values and ages, which enhances across diverse software. It improves semantic data handling, particularly for role-based relationships in events and family structures, allowing explicit definitions of participant roles (e.g., , ) to better capture complex genealogical contexts beyond simple parent-child links. Key innovations include enhanced embedding through the MULTIMEDIA_RECORD structure and GEDZIP packaging, which bundles external files like images and audio directly with the GEDCOM stream for seamless transfer. The specification supports probabilistic and approximate dates via structures like DatePhrase for expressions of uncertainty (e.g., "about 1850" or ranges with calendars), multiple calendar systems (Gregorian, Julian, Hebrew, French Revolutionary), and period notations, reducing ambiguities in historical records. Place data is augmented with coordinate support using MAP.LATL and MAP.LONG tags for latitude and longitude, facilitating geospatial integration in mapping tools. Internationalization is strengthened by mandating UTF-8 encoding throughout and introducing the LANG tag for language specification, ensuring global compatibility without legacy character set issues. Adoption of GEDCOM 7.0 has been integrated into FamilySearch's core tools for management and export, with growing support in third-party software such as RootsMagic and Family Historian. It includes mechanisms for , allowing import of GEDCOM 5.5 and 5.5.1 files while mapping legacy structures to new semantics, though some breaking changes require validation during conversion. Since the initial 2021 release, updates have focused on patches for validation rules, expanded handling of research notes through versatile NOTE structures, and refined citation schemas in SOURCE records to better accommodate evidence evaluation and multi-source linking. These revisions, tracked via semantic versioning on the official repository, emphasize stability and developer tools for conformance testing.

Release Timeline

The development of GEDCOM began in 1984 when the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) released its first internal version, GEDCOM 1.0, as a proposed standard for exchanging genealogical data within their systems. Subsequent internal iterations, such as version 2.0 in late 1985 and 2.1 in early 1987, were used in software like Personal Ancestral File (PAF) but remained non-public. The first public release occurred on October 9, 1987, with GEDCOM 3.0, which introduced the lineage-linked form for representing family relationships and was made available for broader adoption by genealogical software developers. This was followed by version 4.0 on August 4, 1989, which refined the structure for wider compatibility. Version 5.0 arrived on September 25, 1991, enhancing lineage-linked structures to better handle complex pedigrees. Interim drafts appeared in the early , including 5.1 in September 1992 and 5.3 in November 1993, which experimented with features like support and multimedia but were never finalized. The major milestone of version 5.5 was released on January 2, 1996 (with errata on January 10), incorporating structured addresses, additional name parts, and contributions from standards bodies like the National Genealogical Society, though it did not achieve formal ANSI ratification. A minor update, GEDCOM 5.5.1, followed on November 15, 2019, adding support for encoding, addresses, URLs, and geographic coordinates while maintaining . No official version 6.0 was ever released; a beta draft proposing XML-based storage was circulated in 2002 for developer feedback but was abandoned in favor of alternative formats like GEDCOM X. After a long hiatus, released GEDCOM 7.0 on May 19, 2021, as the first major update in over two decades, introducing semantic versioning, improved multimedia handling via GEDZIP packaging, and resolutions to prior ambiguities. This version has seen ongoing patches, with the latest being 7.0.16 on March 18, 2025, focusing on refinements and .
VersionRelease DateStatusKey Notes
1.01984Internal/ProposedInitial LDS Church development.
3.01987-10-09Public StandardFirst public release; lineage-linked form.
4.01989-08-04StandardCompatibility refinements.
5.01991-09-25StandardEnhanced structures.
5.51996-01-02StandardAddress and name improvements (errata 1996-01-10).
5.5.12019-11-15StandardEncoding and metadata additions.
7.02021-05-19StandardSemantic versioning; GEDZIP support; latest patch 7.0.16 (2025-03-18).

Key Features

Multimedia Integration

GEDCOM supports the integration of elements, such as images, audio, and documents, primarily through the OBJE record type, which allows genealogical software to or embed media files associated with individuals, families, or events. The OBJE record is defined at level 0 as 0 @O1@ OBJE, serving as a for media details without storing the actual file in earlier versions. Subordinate tags within the OBJE record include 1 FILE photo.jpg to specify the file path or , followed by 2 FORM JPG to indicate the media format, ensuring compatibility across systems. Linking to core records occurs via a pointer tag, such as 1 OBJE @O1@ under an individual's (INDI) or 's (FAM) event structure, enabling direct association without duplicating file information. In GEDCOM 5.5, optional embedding via binary large objects (BLOB) was supported, but this was deprecated in 5.5.1 and later versions, limiting integration to external file references to maintain file portability and simplicity. Additional metadata, such as descriptive notes via 1 NOTE This is a [family](/page/Family) [portrait](/page/Portrait) from 1950, can accompany the OBJE to provide context like captions. GEDCOM 7.0 maintains external file references for but introduces GEDZIP, a ZIP archive format with .gdz extension, to bundle the GEDCOM file and associated media files using local paths (e.g., media/filename), enabling self-contained transmission particularly useful for web-based applications. This version also expands metadata options, including NOTE for detailed captions and subtags under MULTIMEDIA_LINK (e.g., 1 CROP 2 TOP 10 2 LEFT 20 2 HEIGHT 100 2 WIDTH 150) to specify image coordinates for cropping or zooming. Legacy limitations persist in older implementations, where only references are supported, potentially complicating data transfer if files are not bundled separately. Common use cases include attaching photographs to family (FAM) records to visualize group portraits or events, and linking audio files to individual (INDI) records for oral histories, such as digitized recordings of personal narratives. For instance, a sound bite of an ancestor's story can be referenced alongside a scanned photo, enriching the genealogical context without altering the core text-based structure.

Source Citations and Conflicting Data

In GEDCOM, source citations are primarily handled through the SOUR tag, which allows users to attribute specific pieces of genealogical data to their evidentiary origins. The SOUR tag appears as a level 1 structure (e.g., 1 SOUR @S1@) within event or fact substructures, pointing to a separate SOURCE_RECORD via a identifier. This SOURCE_RECORD contains detailed metadata, such as the source's title (TITL tag), author (AUTH tag), publication details (PUBL tag), and repository information (REPO tag), enabling comprehensive documentation without redundancy. To enhance citation precision, substructures under SOUR include the PAGE tag for specifying exact locations within the source (e.g., 2 PAGE 45) and the TEXT tag for excerpting relevant verbatim content (e.g., 2 TEXT Birth date as 12 May 1920). Multiple SOUR tags can be attached to a single fact, accommodating variant interpretations from different sources, such as conflicting birth dates from records versus vital certificates. The QUAY tag further assesses citation reliability on a scale from 0 (unreliable) to 3 (primary ), aiding in evaluating evidential weight. For conflicting data, GEDCOM recommends representing discrepancies—such as variant event dates or places—in separate event structures, each with its own source citations to preserve evidential context without merging or overwriting information. The ASSO tag links associated individuals to the cited data (e.g., 1 ASSO @I2@ 2 RELA Witness), while the NOTE tag provides explanatory commentary on discrepancies (e.g., 2 NOTE Conflicting date likely due to transcription error). In GEDCOM 7.0, these mechanisms remain core, with enhancements like UTF-8 support for multilingual notes improving clarity in international contexts, though no dedicated CONFL tag is introduced for explicit conflicts. Pointers via cross-references (@XREF@) facilitate linking these elements across records. Best practices emphasize retaining all sourced variants to avoid during file exchanges, using the _UID (unique identifier) tag to track individual records and changes across software implementations. This approach ensures , as overwriting conflicting data can obscure research provenance, while structured citations promote among applications.

Internationalization Support

GEDCOM provides mechanisms for handling international data, enabling the representation of genealogical information across diverse languages, scripts, and cultural contexts. Early versions, such as 5.5, primarily relied on the ANSEL character set, an 8-bit extension of ASCII designed for Latin-based languages with diacritics, as specified in the HEADER's CHAR tag (e.g., "1 CHAR ANSEL"). This encoding supported most Western European characters but had limitations for non-Latin scripts. In contrast, GEDCOM 7.0 mandates encoding exclusively, aligning with ISO/IEC 10646:2020 and RFC 3629, to accommodate a broader range of characters, including those from Asian, African, and Middle Eastern languages. Files in this version use the .ged extension and recommend a U+FEFF byte-order mark for compatibility. The LANG tag facilitates , allowing users to specify the human of textual content. In GEDCOM 5.5, this is implemented as a level 1 tag in the header (e.g., "1 LANG en-US"), indicating the primary language for reading or writing the data, with examples including English, French, and Hebrew. GEDCOM 7.0 enhances this with BCP 47 compliant tags (e.g., "en", "es", "he"), applied in structures like SUBM.LANG or NOTE.LANG to denote the language of specific elements, supporting multilingual documents. This tag aids in processing and display, particularly for locale-specific formatting. Place names are structured hierarchically using the PLAC tag, listing jurisdictions from smallest to largest (e.g., "2 PLAC City, County, State, Country"), with no abbreviations permitted to ensure clarity. In GEDCOM 5.5, phonetic and romanized variations are supported via FONE and ROMN substructures, optionally with a TYPE tag for the phonetisation method. GEDCOM 7.0 extends this with TRAN substructures under PLAC, enabling translations or transliterations tied to specific LANG tags (e.g., "2 PLAC 千代田, 東京, 日本" for Japanese), fully leveraging UTF-8 for non-Latin scripts like Cyrillic, Arabic, or Chinese characters. Date handling accommodates cultural calendars through escape sequences. GEDCOM 5.5 uses DATE_CALENDAR_ESCAPE delimiters, such as @#DGREG@ for Gregorian, @#DJUL@ for Julian, and @#DHEBR@ for Hebrew (e.g., "@#DHEBR@ 1 5700"), supporting events in non-Gregorian systems like the Hebrew . French Revolutionary dates follow @#DFRNC R@. While standard calendars do not explicitly include lunisolar systems like the , custom escapes allow representation of cultural events such as via Unicode-encoded month names. In GEDCOM 7.0, calendars are formalized with URIs (e.g., g7:cal-GREGORIAN, g7:cal-HEBREW), and dual dates (e.g., Gregorian/Hebrew) use structures for precision, with Hebrew months like תִּשְׁרֵי rendered in original script. GEDCOM 7.0 addresses advanced internationalization challenges by relying on and -aware libraries for right-to-left (RTL) text rendering, as seen in Hebrew or place names and dates, without proprietary extensions. Locale-specific sorting is guided by LANG tags, enabling applications to apply appropriate rules (e.g., ignoring diacritics in French or handling gematria in Hebrew), though implementations must use standard algorithms for consistency. These features ensure GEDCOM's viability for global , from European diacritics to East Asian ideographs.

Limitations

Multi-Person Events and Relationships

In GEDCOM, events involving multiple individuals, such as births (BIRT) and marriages (MARR), are structured to link participants through specific tags and pointers. The BIRT event, typically recorded under an individual's (INDI) record, can reference the family of origin via a FAMC (family as child) pointer, which connects the child to a family (FAM) record containing the parents as husband (HUSB) and wife (WIFE). Similarly, the MARR event is placed under a FAM record and directly includes HUSB and WIFE pointers to the involved individuals, with optional ROLE tags to specify roles like "Bride" or "Groom" (e.g., 1 ROLE Wife). These structures allow events to associate multiple people without embedding full details of all participants in a single event block. However, GEDCOM's design primarily emphasizes units, limiting direct support for more complex multi-person scenarios. Family records (FAM) are constrained to one and one , requiring multiple FAM records for polygamous or serial relationships, which can fragment data. and step-relations are handled indirectly: adoptions use an ADOP event under the INDI's FAMC with enumerations like HUSB or to indicate which parent is adoptive, while step-relationships often rely on ASSO (association) tags with RELA (relationship) descriptors or custom underscore-prefixed tags (e.g., _) for non-standard ties. This approach, while flexible, often results in incomplete or vendor-specific representations of extended kinships. GEDCOM 7.0 introduces enhancements to better accommodate multi-person events and non-linear relationships. Semantic roles are expanded through the ROLE tag under ASSO, allowing explicit designations such as INFORM (informant) or WITN (witness) for participants beyond primary family members, enabling more precise graphing of interactions. The specification also improves support for complex topologies by combining multiple FAMS pointers with ASSO links and unique identifiers (UID), facilitating bidirectional connections in non-nuclear structures like blended families without excessive duplication. These changes aim to model relationships as a more interconnected graph rather than isolated hierarchies. However, as of November 2025, adoption of GEDCOM 7.0 remains limited, with version 5.5.1 continuing as the de facto standard in most genealogy software, meaning these enhancements are not yet widely realized. A common challenge in GEDCOM files arises from handling shared events across multiple individuals, leading to . For instance, a event must be duplicated in the FAM record and referenced separately in each spouse's INDI record via FAMS, potentially creating inconsistent details if not managed carefully. This duplication can inflate file sizes and increase error risks during imports, though unique in later versions help mitigate some inconsistencies.

Specification Ambiguities

The GEDCOM standard, particularly in versions 5.5 and 5.5.1, contains several vague definitions that hinder consistent implementation across software applications. For instance, optional tags such as ADDR (address structure) and PLAC (place structure) within event details lack precise guidance on their hierarchical placement relative to other substructures, leading to variations in how location data is encoded and interpreted during file exchanges. Additionally, the specification does not enforce uniqueness for cross-reference identifiers (XREFs) beyond basic recommendations, allowing duplicate or conflicting pointers that can cause data loss or mislinking when importing files into different genealogy programs. Version drift exacerbates these issues through the widespread use of private tags, which begin with an (_) to denote non-standard extensions. While the standard permits these for software-specific features, it provides no mechanisms for or , resulting in that becomes inaccessible or corrupted in incompatible systems. This proliferation of undocumented private tags has led to significant challenges, as applications often ignore or mishandle unrecognized elements without clear fallback rules. Parsing variances further complicate data exchange, particularly with line continuations using CONC (concatenate) and CONT (continue) tags. The semantics specify that CONC joins text without inserting a or altering spacing, while CONT adds a to preserve paragraph breaks, but implementations vary in handling edge cases like leading spaces or multi-line payloads, often resulting in garbled notes or addresses. Similarly, date approximations such as ABT (about) are defined as indicating inexact timing, but the lack of quantitative guidance—e.g., whether "ABT 1900" implies a range of years or months—leads to inconsistent sorting, searching, and validation across tools. GEDCOM 7.0 addresses many of these ambiguities through stricter schemas and explicit validation guidelines. It mandates encoding exclusively, eliminates the CONC tag in favor of CONT for all continuations, and enforces unique XREFs document-wide to prevent duplication errors. Extensions are now formalized via URI-mapped schemas in a public registry, reducing version drift by encouraging standardized private tags, while detailed rules for date parsing—including clearer handling of approximations—improve overall compliance and testing via open-source validators. However, as of November 2025, adoption of GEDCOM 7.0 remains limited, with version 5.5.1 continuing as the in most , meaning these enhancements are not yet widely realized.

Undated Event Ordering

In GEDCOM files, events within individual (INDI) or family (FAM) records are assumed to be listed in chronological order according to the submitter's intent, serving as the default sequence without dedicated sort keys or explicit chronological enforcement in the specification. This convention relies on the order of equal-level tags to reflect preference, with the first occurrence typically deemed most important. Undated events, such as a residence or occupation lacking a specific year, pose significant challenges to this assumed timeline, as they lack date values for automated sorting and may appear in arbitrary positions depending on the importing software's reordering logic. Different genealogy programs handle these events variably—some place them at the end of lists or by entry order—potentially disrupting the intended sequence and leading to inconsistent interpretations across tools. Common workarounds include assigning approximate dates using phrases like "BET 1900 AND 1910" to estimate ranges or incorporating contextual details via NOTE structures to guide manual review. GEDCOM 7.0 introduces the SDATE substructure under EVENT_DETAIL, enabling a non-historical sorting hint (e.g., a normalized date for display purposes) to maintain intended order for undated or ambiguously dated events without compromising the primary DATE value. However, as of November 2025, adoption of GEDCOM 7.0 remains limited, with version 5.5.1 continuing as the de facto standard in most genealogy software, meaning these enhancements are not yet widely realized. These ordering issues affect the reliability of timeline visualizations and narrative generation in genealogy software, where precise event sequences are essential for constructing coherent life stories and avoiding logical inconsistencies in reports.

Extensions and Alternatives

GEDCOM X

GEDCOM X is an open specification developed by in 2012 for exchanging genealogical data essential to the . It defines a conceptual and serialization formats in XML and to represent structured information about persons, relationships, sources, and conclusions. Unlike the line-based format of traditional GEDCOM, GEDCOM X incorporates RDF semantics to enable more flexible modeling of relationships and . Key differences from GEDCOM 5.5 include its modular architecture, which allows extensions for specific elements such as places and conclusions, enabling more precise and extensible data representation without altering the core model. It maintains through dedicated converters that transform GEDCOM 5.5 files into GEDCOM X format losslessly, facilitating integration with legacy systems. This approach addresses the rigidity of traditional GEDCOM by providing a more adaptable framework for modern data exchange. Notable features encompass versioned resources via standardized headers for metadata like timestamps and revisions, as well as web-oriented design supporting APIs for seamless integration in networked environments. These elements make GEDCOM X suitable for API-driven applications while preserving the integrity of genealogical narratives. GEDCOM X has been adopted primarily within 's developer platform, where it underpins APIs for data interchange and management, though it serves as a complementary format rather than a complete replacement for GEDCOM.

Other Genealogical Data Formats

While GEDCOM remains the for genealogical data interchange, several alternative formats have emerged to address its limitations in handling complex structures, , and modern data needs. These alternatives often leverage XML or simpler structures like CSV and , offering greater flexibility for specific use cases such as integration or web-based applications. The Gramps XML format, developed for the open-source Gramps , provides a comprehensive XML-based structure for storing and exchanging genealogical data. It supports advanced features like detailed event relationships, embeddings, and custom attributes that GEDCOM struggles to represent without loss of information. As the native format for Gramps, it enables lossless backups and imports, making it ideal for users prioritizing in complex family trees. GenXML serves as another XML-centric alternative, designed specifically for data exchange between genealogy programs as a more extensible option to GEDCOM 5.5. Originating from European development efforts, it emphasizes structured schemas for persons, families, and sources. It is used in niche European software. Simpler open formats such as CSV and have gained traction in web-based tools for their ease of use in data exports and imports, facilitating quick analysis or integration with , though these lack GEDCOM's hierarchical tagging for relationships and events. CSV exports, supported by tools like Gramps, enable bulk spreadsheet handling but require manual reconstruction of linkages, suiting lightweight web apps over full database migrations. In comparisons, GEDCOM's plain-text simplicity promotes broad compatibility across legacy software, but XML-based formats like Gramps XML and GenXML offer richer schemas for and relationships, reducing during transfers. As of 2025, no single alternative has displaced GEDCOM's dominance, with adoption varying by software ecosystem—XML for robust desktop tools and CSV/ for agile web integrations.

References

User Avatar
No comments yet.