Flat-file database
Flat-file database
Main page
2178424

Flat-file database

logo
Community Hub0 subscribers
What are your thoughts?
Be the first to start a discussion here.
Be the first to start a discussion here.
Flat-file database

A flat-file database is a tabular flat file in which each record is semantically independent – can meaningfully be interpreted and manipulated independent of other records of the table. The term flat loosely refers to data that is record-based and sequential yet lacks more complicated aspects such as nesting, relationships and metadata (with the exception of column headers). Relationships can be inferred from the data, but the format does not provide special accommodations for relationships.

A flat-file database may be stored as plain text or binary (not character encoded). When plain text, it is typically formatted as one record per line either as delimiter-separated or fixed-width.

In delimiter-separated values files, the fields are separated by a character or string called the delimiter. Common variants are comma-separated values (CSV) where the delimiter is a comma, tab-separated values (TSV) where the delimiter is the tab character, space-separated values and vertical-bar-separated values (delimiter is |). If the delimiter is allowed inside a field, there needs to be a way to distinguish delimiters characters or strings that are meant literally. For example, consider the sentence "If I have to, I'll do it myself.". To encode it in CSV, there needs to be a way to prevent the comma from splitting the field. Several strategies to prevent delimiter collision exist.

With fixed-width formats, each field has a fixed length with extra spaces added as needed. The fixed lengths can be predefined and known ahead of time (i.e. stated in the format's specification), or parsed from a header. With predefined lengths, fields are limited to a maximum length. The need for longer fields may appear sometime after the format is defined. Possible workarounds include abbreviating phrases, replacing values with links (e.g. a URI pointing to the value), and splitting a file into multiple files. With delimiter-separated formats, determining the field boundaries requires finding the delimiters, which incurs some computational overhead. This is not needed for fixed-width formats. However, fixed-width formats can lead to unnecessarily large file sizes if fields tend to be shorter than the lengths reserved for them.

Delimiters can be used alongside a notation stating the length of each field. For example, 5apple|9pineapple specifies the length (5 and 9) of each field. This is called declarative notation. It has low overhead and trivially avoids delimiter collisions, but it is brittle when edited manually.

Herman Hollerith's work for the US Census Bureau first exercised in the 1890 United States census, involving data tabulated via hole punches in paper cards, is sometimes considered the first computerized flat-file database, as it included no cards indexing other cards, or otherwise relating the individual cards to one another, save by their group membership.[citation needed]

In the 1980s, configurable flat-file database computer applications were popular on the IBM PC and the Macintosh. These programs were designed to make it easy for individuals to design and use their own databases, and were almost on par with word processors and spreadsheets in popularity.[citation needed] Examples of flat-file database software include early versions of FileMaker and the shareware PC-File and the popular dBase.

Flat-file databases are common and ubiquitous because they are easy to write and edit, and suit myriad purposes in an uncomplicated way.

See all
User Avatar
No comments yet.