Recent from talks
Nothing was collected or created yet.
Ghost word
View on WikipediaA ghost word is a word published in a dictionary or similarly authoritative reference work even though it had not previously had any meaning or been used intentionally. A ghost word generally originates from readers interpreting a typographical or linguistic error as a word they are not familiar with, and then publishing that word elsewhere under the misconception that it is an established part of the language.
Once authoritatively published, a ghost word occasionally may be copied widely and enter legitimate usage, or it may eventually be discovered and removed from dictionaries.
Origin
[edit]The term was coined by Professor Walter William Skeat in his annual address as president of the Philological Society in 1886:[1]
Of all the work which the Society has at various times undertaken, none has ever had so much interest for us, collectively, as the New English Dictionary. Dr Murray, as you will remember, wrote on one occasion a most able article, in order to justify himself in omitting from the Dictionary the word abacot, defined by Webster as "the cap of state formerly used by English kings, wrought into the figure of two crowns". It was rightly and wisely rejected by our Editor on the ground that there is no such word, the alleged form being due to a complete mistake ... due to the blunders of printers or scribes, or to the perfervid imaginations of ignorant or blundering editors. ...
I propose, therefore, to bring under your notice a few more words of the abacot type; words which will come under our Editor's notice in course of time, and which I have little doubt that he will reject. As it is convenient to have a short name for words of this character, I shall take leave to call them "ghost-words." ... I only allow the title of ghost-words to such words, or rather forms, as have no meaning whatever.
... I can adduce at least two that are somewhat startling. The first is kime ... The original ... appeared in the Edinburgh Review for 1808. "The Hindoos ... have some very savage customs ... Some swing on hooks, some run kimes through their hands ..."
It turned out that "kimes" was a misprint for "knives", but the word gained currency for some time. Skeat continued with a more drastic example:[2]
A similar instance occurs in a misprint of a passage of one of Walter Scott's novels, but here there is the further amusing circumstance that the etymology of the false word was settled to the satisfaction of some of the readers. In the majority of editions of The Monastery, we read: ... dost thou so soon morse thoughts of slaughter? This word is nothing but a misprint of nurse; but in Notes and Queries two independent correspondents accounted for the word morse etymologically. One explained it as to prime, as when one primes a musket, from O. Fr. amorce, powder for the touchhole (Cotgrave), and the other by to bite (Lat. mordere), hence "to indulge in biting, stinging or gnawing thoughts of slaughter". The latter writes: "That the word as a misprint should have been printed and read by millions for fifty years without being challenged and altered exceeds the bounds of probability." Yet when the original manuscript of Sir Walter Scott was consulted, it was found that the word was there plainly written nurse.
One edition of The Monastery containing the misprint was published by the Edinburgh University Press in 1820.[3]
More examples
[edit]In his address, Skeat exhibited about 100 more specimens that he had collected.
Other examples include:
- The supposed Homeric Greek word στήτη (stētē) = 'woman', which arose thus: In Iliad Book 1 line 6 is the phrase διαστήτην ἐρίσαντε (diastētēn erisante) = "two (i.e. Achilles and Agamemnon) stood apart making strife". However someone unfamiliar with dual number verb inflections read it as διά στήτην ἐρίσαντε (dia stētēn erisante) = "two making strife because of a stētē", and they guessed that stētē meant the woman Briseis who was the subject of the strife, influenced by the fact that nouns ending with eta are usually feminine.[4]
- The placename Sarum, which arose by misunderstanding of the abbreviation Sar~ used in a medieval manuscript to mean some early form such as "Sarisberie" (= Salisbury).[5]
- As an example of an editing mistake, dord was defined as a noun meaning density (mass per unit volume). When the second edition of Webster's New International Dictionary was being prepared, an index card that read "D or d" with reference to the word density was incorrectly misfiled as a word instead of an abbreviation. The entry existed in more than one printing from 1934 to 1947.[6][7]
- A Concise Dictionary of Pronunciation (ISBN 978-0-19-863156-9) accidentally included the nonexistent word testentry (evidently a feature of work-in-progress), with spurious British and US pronunciations as though it rhymed with pedantry.[citation needed][importance of example(s)?]
- The OED explains the ghost word phantomnation as "Appearance of a phantom; illusion. Error for phantom nation".[8] Alexander Pope's (1725) translation of the Odyssey originally said, "The Phantome-nations of the dead". Richard Paul Jodrell's (1820) Philology of the English Language, which omitted hyphens from compounds, entered it as one word, "Phantomnation, a multitude of spectres". Lexicographers copied this error into various dictionaries, such as, "Phantomnation, illusion. Pope." (Worcester, 1860, Philology of the English Language), and "Phantomnation, appearance as of a phantom; illusion. (Obs. and rare.) Pope." (Webster, 1864, An American dictionary of the English language).[9]
- The Japanese word kusege (癖毛, compounding kuse 癖 'habit'; 'vice' and 毛 ke 'hair', 'frizzy hair') was mistranslated as "vicious hair" in the authoritative Kenkyūsha's New Japanese-English Dictionary from the first edition (1918) to the fourth (1974), and corrected in the fifth edition (2003) "twisted [kinky, frizzy] hair; hair that stands up".[10] This ghost word was not merely an unnoticed lexicographical error; generations of dictionary users copied the mistake. For example, a Tokyo hospital of cosmetic surgery had a long-running display advertisement in the Asian edition of Newsweek that read, "Kinky or vicious hair may be changed to a lovely, glossy hair" [sic].[11] This hair-straightening ad was jokingly used in the "Kinky Vicious" title of a 2011 Hong Kong iPhoneography photo exhibition.[12]
- The JIS X 0208 standard, the most widespread system to handle Japanese language with computers since 1978, has entries for 12 kanji that have no known use and were probably included by mistake (for example 彁). They are called ghost characters (yūrei moji, "ghost characters") and are still supported by most computer systems (see: JIS X 0208#Kanji from unknown sources).[citation needed]
- Hsigo, an apparently erroneous output from optical character recognition software for "hsiao", a creature from Chinese mythology. The typographical error appeared in several limited-audience publications but spread around the World Wide Web after the creation of a Wikipedia article about the term (which has since been corrected), due to its numerous mirrors and forks.[citation needed]
- In his book Beyond Language: Adventures in Word and Thought, Dmitri Borgmann shows how feamyng, a purported collective noun for ferrets which appeared in several dictionaries, is actually the result of a centuries-long chain of typographical or misread-handwriting errors (from Busyness to Besyness to Fesynes to Fesnyng to Feamyng).[13][14]
- In the Irish language, the word cigire ('inspector') was invented by the scholar Tadhg Ua Neachtain, who misread cighim (pronounced [ˈciːmʲ], like modern cím) in Edward Lhuyd's Archaeologia Britannica as cigim [ˈcɪɟɪmʲ], and so constructed the verbal forms cigire, cigireacht, cigirim etc. from it.[15][16]
- In Estonian, the verb tuvastama ('to ascertain, identify') originates from a typographical error of the transcription of turvastama (from the root turvaline, 'secure', 'certain').[17]
Speculative examples
[edit]Many neologisms, including those that eventually develop into established usages, are of obscure origin, and some might well have originated as ghost words through illiteracy, such as the term okay. However, establishing the true origin often is not possible, partly for lack of documentation, and sometimes through obstructive efforts on the part of pranksters. The most popular etymology of the word pumpernickel bread—that Napoleon described it as "C'est pain pour Nicole!", being only fit for his horse—is thought to be a deliberate hoax. Quiz also has been associated with apparently deliberate false etymology. All these words and many more have remained in common usage, but they may well have been ghost words in origin.[18]
Distinguished from back-formation
[edit]A recent, incorrect use of the term "ghost word" refers to coining a new word inferred from a real word by falsely applying an etymological rule. The correct term for such a derivation is back-formation, a word that has been established since the late 19th century.[1] An example is "beforemath" derived from "aftermath", having an understandable meaning but not a commonly accepted word. A back-formation cannot become a ghost word; as a rule it would clash with Skeat's precise definition, which requires that the word forms have "no meaning".[1]
See also
[edit]- Corruption (linguistics)
- Fictitious entry
- Funistrada, a fictional food name, created as a control item in a survey
- Folk etymology
- Trap street – a fictitious street inserted into maps for copyright protection
References
[edit]- ^ a b c Skeat, Walter William; Presidential address on 'Ghost-Words' in: 'Transactions of the Philological Society, 1885-7, pages 343–374'; Published for the society by Trübner & Co., Ludgate Hill, London, 1887. May be downloaded at: https://archive.org/details/transact188500philuoft
- ^ Wheatley, Henry Benjamin; Literary Blunders; A Chapter in the "History of Human Error"; Publisher: Elliot Stock, London 1893
- ^ Scott, Walter. The Monastery. Chapter 10, page 156. Published by Edinburgh University Press. 1820. https://archive.org/details/monasteryaroman00scotgoog
- ^ Homer; Fagles, Robert; Knox, Bernard MacGregor Walker (10 September 1990). "The Iliad". New York, N.Y., U.S.A. : Viking. p. 4 – via Internet Archive.
- ^ David Mills (20 October 2011). A Dictionary of British Place-Names. Oxford University Press. pp. 526–. ISBN 978-0-19-960908-6.
- ^ Emily Brewster. "Ghost Word". part of the "Ask the Editor" series at Merriam-Webster.com.
- ^ "dord". Dictionary.com, LLC. Retrieved February 21, 2012.
In sorting out and separating abbreviations from words in preparing the dictionary's second edition, a card marked "D or d" meaning "density" somehow migrated from the "abbreviations" stack to the "words" stack.
- ^ Oxford English Dictionary Second Edition on CD-ROM. Version 4.0, Oxford University Press. 2009.
- ^ William Shepard Walsh; Henry Collins Walsh; William H. Garrison; Samuel R. Harris (1890). American Notes and Queries. Westminster Publishing Company. p. 93. Available at: [1]
- ^ Watanabe Toshirō (渡邊敏郎); Edmund R. Skrzypczak; Paul Snowden, eds. (2003). Kenkyusha's New Japanese-English Dictionary (新和英大辞典) (5th ed.). Kenkyusha. p. 790.
- ^ Michael Carr (1983). "A Lexical Ghost Story: *Vicious hair" (PDF). Jinbun Kenkyū (人文研究. 66: 29–44. Archived from the original (PDF) on 2014-09-11. Carr (p. 40) suggests "vicious hair" for kusege (癖毛) originated through false analogy from Kenkyusha's waraguse (悪癖 "bad/vicious habit; vice") entries.
- ^ Dan Pordes (20 September 2011). "iPhone photos like you've never seen". CNN Travel.
- ^ Borgmann, Dmitri A. (1967). Beyond Language: Adventures in Word and Thought. New York: Charles Scribner's Sons. pp. 79–80, 146, 251–254. OCLC 655067975.
- ^ Eckler, Jr., A. Ross (November 2005). "The Borgmann Apocrypha". Word Ways: The Journal of Recreational Linguistics. 38 (4): 258–260.
- ^ O'Reilly, Edward (10 September 2018). "An Irish-English Dictionary". J. Barlow – via Google Books.
- ^ "Ár dtéarmaí féin".
- ^ "[ETY] Eesti etümoloogiasõnaraamat". eki.ee. Retrieved 2024-02-19.
- ^ Wendell Herbruck (November 2008). Word Histories - A Glossary of Unusual Word Origins. Read Books. ISBN 978-1-4437-3186-7. Available at: [2]
External links
[edit]Ghost word
View on GrokipediaDefinition and Terminology
Core Definition
A ghost word is a spurious entry in a dictionary or similar reference work that arises from a misprint, misreading, or transcription error, without any prior real-world usage or etymological foundation in the language.[5] These erroneous terms lack inherent meaning and enter authoritative sources solely due to human or mechanical mistakes during compilation or printing.[6] The concept was first termed by philologist Walter W. Skeat in his 1886 presidential address to the Philological Society.[5] Key characteristics of ghost words include their deceptive legitimacy conferred by inclusion in respected publications, despite tracing back to a isolated error, and their tendency to endure uncorrected for years or even decades until rigorous verification uncovers the flaw.[7] For illustration, "dord" appeared in the 1934 edition of Webster's Second New International Dictionary as a synonym for "density," but it originated from a shorthand note misinterpreted as a new entry.[8] Historically, ghost words proliferated in printed materials such as books and early dictionaries, where manual typesetting, limited proofreading, and the replication of errors across editions facilitated their unintended dissemination over centuries.[9]Etymology of the Term
The term "ghost word" was coined by the English philologist Walter William Skeat during his presidential address to the Philological Society on May 21, 1886, titled "Fourteenth Annual Address of the President to the Philological Society."[10] Delivered as part of the society's anniversary meeting, the address focused on errors in etymological scholarship and dictionary compilation, highlighting how inadvertent mistakes could perpetuate nonexistent words in authoritative references.[11] Skeat introduced the phrase to characterize these spurious entries as spectral entities that "haunt" dictionaries without any basis in actual language use, stating: "As it is convenient to have a short name for words of this character, I shall take leave to call them 'ghost-words.'"[12] He justified the metaphor by emphasizing their illusory nature, arising from misreadings, typographical errors, or unfounded assumptions by scholars, and provided initial illustrations such as abacot—a fabricated term for a head covering stemming from a misprint of "a bycoket"—and kime, an erroneous form derived from a printer's blunder in rendering "kine" (meaning cows).[13] Following its introduction in the Transactions of the Philological Society (1885–1887, pp. 350–373), the term gained rapid adoption among lexicographers and linguists as a precise descriptor for dictionary phantoms.[10] It appeared in early 20th-century etymological works and supplements to major dictionaries, such as the Oxford English Dictionary, where it helped standardize discussions of editorial pitfalls.[14] Today, "ghost word" endures as a foundational concept in lexicography and historical linguistics, invoked in analyses of textual transmission and reference work integrity.[11]Historical Development
Coinage by Walter Skeat
On May 21, 1886, Walter William Skeat, the president of the Philological Society, delivered his annual presidential address to the society, in which he coined the term "ghost word" to refer to spurious entries in dictionaries arising from scribal, printing, or editorial errors rather than genuine linguistic usage.[10] Titled "Fourteenth Address of the President, to the Philological Society, Delivered at the Anniversary Meeting, Friday, 21st May, 1886," the speech emphasized the need for rigorous verification in lexicography to distinguish real words from these phantoms, drawing on his expertise as a philologist and etymologist.[11] The address was subsequently published in the Transactions of the Philological Society (1885–1887, pp. 350–374), where Skeat systematically cataloged numerous such errors to illustrate their propagation across scholarly works.[15] Skeat illustrated his concept with specific examples from historical sources, including "abacot," which he identified as a ghost word originating from a 16th-century misprint in the second edition of Holinshed's Chronicles (1587), where the legitimate term "bycoket"—denoting a type of medieval cap or hat—was corrupted into "abacot" through a compositor's reversal of letters ("a bycoket" becoming "an abycocket," then simplified).[11] This error was uncritically copied into later dictionaries, such as Bailey's (1721) and Johnson's (1755), perpetuating the fabrication despite no evidence of independent usage.[11] Similarly, Skeat highlighted "kime," a 19th-century printing error in the Edinburgh Review where the intended word "knife" was misprinted (in contexts like "a kime of knives" in a review by Sydney Smith), leading to its erroneous inclusion in glossaries and dictionaries as a supposed plural form or variant.[11] These cases, traced to printed editions and earlier manuscript traditions, demonstrated how minor transcription flaws could haunt lexicographical records for centuries if not scrutinized.[11] The immediate scholarly reception of Skeat's address was positive, with contemporaries praising its illumination of dictionary-making vulnerabilities and its call for evidence-based etymology.[15] This led to practical reforms, including heightened caution in ongoing projects like the Oxford English Dictionary (OED), where editor James A. H. Murray—Skeat's colleague and friend—incorporated the term "ghost word" into OED usage by 1899 and implemented quotation-based verification to excise or flag similar entries in subsequent fascicles and revisions.[15] Skeat's intervention thus marked a pivotal moment in philological rigor, influencing the correction of ghost words in major dictionaries and underscoring the society's role in advancing accurate linguistic scholarship.[11]Pre-20th Century Instances
Ghost words trace their origins to the pre-modern era, particularly through errors in manuscript transcription and early printing processes that predated standardized philological scrutiny. These instances often arose from scribal misreadings of abbreviations, ligatures, or unfamiliar scripts in medieval and Renaissance texts, leading to fabricated lexical entries that persisted in scholarly works. Such errors highlight the challenges of transmitting knowledge before the advent of modern textual criticism, where a single misinterpretation could propagate across generations of copies and editions.[16] A prominent medieval example is the place name "Sarum," which emerged from a misunderstanding of the abbreviated Latin form "Sar." or "Sarisb." for "Sarisburiensis," the genitive of Salisbury in ecclesiastical documents. This contraction, common in medieval manuscripts to save space, was reinterpreted as a standalone term, resulting in "Sarum" being adopted as an alternative name for the site of Old Sarum and later the liturgical Use of Sarum. The error solidified in historical and liturgical texts by the 16th century, despite its lack of basis in original nomenclature.[17] In the realm of Old English glossaries, the term "drisne" exemplifies scribal confusion in bilingual Latin-Old English word lists from the 11th century or earlier. Recorded in the Antwerp Glossary and subsequent transcripts as a gloss for "capillamenta" (meaning "hair" or "filaments"), "drisne" was likely a misreading of a form related to "perruque" or a corrupted Latin entry for false hair or wig; it entered dictionaries as a supposed Old English word for "wig" or "false hair" but has no independent attestation in authentic texts. Philological analysis later identified it as a ghost word stemming from transcriptional inaccuracies in monastic copying practices.[16] The transition to print in the 17th and 18th centuries introduced compositor errors, as seen in "phantomnation," which originated from a 1725 edition of Alexander Pope's translation of Homer's Odyssey. Here, the phrase "phantom nation"—referring to a spectral society—was inadvertently fused into a single word during typesetting, appearing as "To all the phantomnations of the dead." This fabrication was subsequently listed in 19th-century dictionaries like Webster's (1864) as a rare term meaning "the appearance of a phantom; illusion," perpetuating the error until exposed by lexicographers.[18] These pre-20th century cases reveal recurring patterns: manuscript scribes often expanded ambiguous abbreviations or ligatures (e.g., tildes or suspensions) into novel forms to fit glosses, while early printers, working under tight deadlines with movable type, compounded ambiguities through spacing or justification errors. Such mistakes proliferated in scholarly editions and glossaries, where authoritative imitation reinforced their validity, underscoring the vulnerability of language preservation in the absence of rigorous verification methods. Walter Skeat's later coinage of "ghost word" drew directly from these historical precedents to critique such lexical phantoms.[16]Established Examples
Manuscript and Print Errors
Ghost words frequently emerged from errors in manuscript transcription and the mechanical printing processes of the 19th and 20th centuries, when advances in typesetting and proofreading did not eliminate human fallibility. These errors encompassed typographical swaps, such as the reversal or transposition of letters during composition; ink blots or smudges in manuscripts that were misinterpreted as distinct characters by copyists or typesetters; and editorial interventions where obvious mistakes were glossed over or rationalized as archaic variants, thereby embedding them in subsequent publications. Such mechanisms allowed spurious terms to circulate in printed books and reference works, often evading detection across multiple editions until scholarly scrutiny intervened.[18] A notable 19th-century instance is abacot, which entered early editions of the Oxford English Dictionary (OED) as an obsolete term denoting a high cap of estate resembling a double crown. The word stemmed from a 16th-century typographical error in Edward Hall's The Union of the Two Noble and Illustre Families of Lancastre & Yorke (1548), where "bicocket"—a legitimate Middle English term for a flat, round cap—was misprinted as "abococket" due to a likely reversal of letters in typesetting from manuscript sources. This fabrication persisted into the 19th century, appearing in the OED's initial fascicles published from 1884 onward, until philologist Walter W. Skeat debunked it in his 1886 address to the Philological Society, prompting its excision from later revisions.[18] 20th-century print errors similarly generated ghost words through production mishaps in authoritative texts. "Dord" is a well-known example of such an error (see below). Another case, "testentry," surfaced in the 1977 A Concise Dictionary of Pronunciation (Oxford University Press), where a placeholder term intended for internal testing during page proofs was accidentally retained in the printed volume, including a fabricated pronunciation guide, before being corrected in reprints. These examples highlight how even rigorous printing workflows in the mechanical era could perpetuate manuscript-like errors into final publications.[8]Dictionary-Specific Errors
One of the most notorious examples of a dictionary-specific ghost word is "dord," which appeared in the 1934 edition of Webster's New International Dictionary, Second Edition. The term originated from a 1931 handwritten note by Austin M. Patterson, a chemical supervisor at Merck & Co., instructing editors to add "density" as an abbreviation under the entry for "D or d" (with "cont/" denoting "continued"). An assistant misinterpreted the notation as a new entry for a noun defined as "density" in the fields of physics and chemistry, leading to its inclusion without etymology or usage examples. The error persisted for five years until 1939, when an alert editor, Phillip T. McCready, noticed the anomaly during proofreading and had it removed; the proofreader's marginal note declared it "&! A ghost word," marking the first documented use of that phrase in this context.[8][19] Another historical instance is "abacot," which entered several 17th- and 18th-century English dictionaries, including early editions of Nathan Bailey's Dictionarium Britannicum (1730), as a term for an ancient type of headwear resembling a cap with ear flaps. In reality, it derived from the 1548 typographical error in Edward Hall's chronicle (as detailed above), where the phrase "a bycoket" (referring to a pointed medieval hat called a bycocket) was misprinted as "abococket." This fabrication was perpetuated through unchecked citations in subsequent lexicographic works, such as those by Elisha Coles and Benjamin Martin, until philologists in the 19th century traced its spurious origin.[18][2] In the realm of non-alphabetic scripts, the Japanese Industrial Standards committee's JIS X 0208 encoding (established in 1978 and foundational to Shift JIS and other systems) incorporated at least 12 "ghost kanji" or yūrei kanji—characters like 彁 (U+5F41), 妛 (U+599A), and 墸 (U+58B8)—that lack any verifiable historical usage, etymology, or meaning in classical or modern Japanese texts. These arose during the standard's compilation from incomplete or erroneous submissions by font designers and scholars, filling reserved slots in the 6,879-character set without rigorous attestation; for instance, 彁 was later identified as a possible corrupted variant of a rare seal-script form, but most remain unexplained phantoms in digital typography. Despite their inclusion, these characters have never entered common dictionaries like Daijirin or Kōjien as valid words, serving instead as cautionary artifacts in encoding history.[20][21] Such dictionary-specific errors often stem from systemic pitfalls in lexicographic compilation, including overreliance on unverified secondary sources, where editors propagate prior mistakes without consulting primary texts. Editorial oversights, such as failing to cross-check abbreviations or handwritten notes, exacerbate the issue, as seen in the rushed production of massive works like Webster's Second. Additionally, the pressure to complete entries under tight deadlines can lead to unchecked inclusions from contributor glossaries, perpetuating ghosts across editions until systematic revisions expose them. These vulnerabilities highlight the human element in dictionary-making, where even authoritative references like the New Oxford American Dictionary have admitted fabricated entries, such as "esquivalience" (a hoax term for duty-shirking inserted in 2001 and quietly removed by 2012 after its exposure).[22][23]Modern and Digital Examples
Autocorrect and OCR Errors
In the digital age, autocorrect features on smartphones and computers have occasionally produced erroneous terms that mimic real words, potentially seeding ghost words in informal or user-generated contexts. For instance, the term "cdesign proponentsists" emerged from a word processor's incomplete find-and-replace operation in the 1987 book Of Pandas and People, where "creationists" was partially substituted with "design proponentsists," resulting in a nonsensical hybrid that was printed and later recognized as a fictitious entry.[24] This digital editing mishap, akin to autocorrect overreach, illustrates how automated text processing can fabricate words that enter public discourse without genuine linguistic basis.[25] Optical character recognition (OCR) technology, used to digitize historical texts, frequently introduces errors that generate spurious words, particularly in large-scale archives like Google Books. These misreads can create "ghost" attestations—fictitious occurrences that mislead researchers about a word's historical usage or existence. A study simulating OCR errors in information retrieval found that such misspellings significantly reduce search accuracy, with notable degradation in recall at error rates of 5% or higher, and substantial impacts at around 20% character error rates typical in historical documents, as algorithms treat garbled terms like real variants (e.g., "senatoradmits" from "senator admits").[26] In digitized corpora, common OCR artifacts include joined words or character substitutions from poor scan quality, such as faded ink or unusual fonts in old books, leading to invented forms that propagate through uncorrected digital editions.[27] One notable case involves suspected OCR-derived entries in crowdsourced resources like Wiktionary, where unverified digital scans contribute to ghost word nominations. For example, "texter" was proposed as an Australian slang term for a marker pen but flagged as likely a misspelling of "Texta" (a brand name) due to lack of verifiable usage.[28] Similarly, "benthoses" appeared as a potential plural of "benthos" (deep-sea sediment) but was deemed an error due to limited and questionable attestations, possibly from non-native English usage.[29] These instances highlight how OCR flaws in digital archives can amplify errors, creating pseudo-attestations that challenge lexicographers to distinguish genuine usage from artifacts. Digital propagation exacerbates the issue, as algorithms in search engines and AI-generated content recycle OCR or autocorrect errors without scrutiny. In crowdsourced dictionaries, unverified entries from faulty digitized sources can gain traction; Wiktionary's verification process has removed numerous such candidates, preventing their establishment as accepted terms.[28] Meanwhile, AI tools trained on noisy digital corpora may reproduce these ghosts, further embedding them in generated text, though rigorous post-processing mitigates this in reputable outputs. Unlike historical dictionary errors from manual transcription, these modern variants arise from algorithmic limitations post-2000, underscoring the need for advanced error-correction in digital linguistics.[30]Online and Social Media Instances
In the digital landscape, ghost words frequently emerge from typographical errors or autocorrect mishaps shared on social media and online forums, where rapid sharing can elevate them to apparent legitimacy before their erroneous nature is identified. Unlike traditional print errors, these instances often gain traction through user-generated content platforms like Urban Dictionary, where community votes can propel a misspelling into slang status. For example, "pwned," a corruption of "owned" meaning to dominate or defeat an opponent, originated as a 2000 typing error in the video game Warcraft III, where the mapmaker intended "ownz" but typed "pwnz" due to keyboard proximity; it quickly spread across gaming communities and entered broader internet lexicon as leetspeak.[31][32] Similarly, "teh," a deliberate misspelling of "the," arose from common keyboard slips in early online chats and forums around the late 1990s, evolving into ironic or emphatic slang used for humorous effect in memes and posts.[33][34] A prominent social media case is "covfefe," which appeared in a 2017 tweet by then-U.S. President Donald Trump as an apparent incomplete thought or autocorrect fail for "coverage," sparking widespread speculation and memes across Twitter (now X) and other platforms. The term rapidly amassed definitions on Urban Dictionary, where users voted it as a noun for the typo itself or a verb for posting unfinished messages, amassing thousands of upvotes and exemplifying how viral errors can mimic real words.[35][36] On forums like Reddit, similar viral typos—such as inadvertent neologisms in subreddit threads—have led to temporary dictionary-like entries via user submissions, akin to the historical "dord" but accelerated by upvotes and shares, though many are later debunked as nonstandard.[37] The modern impact of these online ghost words stems from social media algorithms, which prioritize engaging content to boost visibility, enabling erroneous terms to disseminate globally within hours and complicating etymological verification. Linguists note that platforms like TikTok and Twitter amplify slang variants through recommendation systems, fostering "algospeak" where coded or altered words evade moderation while spreading organically, often outpacing fact-checks.[38][39] This velocity poses challenges for linguists and lexicographers, as unverified origins persist in memes and user dictionaries until authoritative sources intervene with corrections, highlighting the tension between community-driven language evolution and rigorous documentation.[40]Speculative Cases
Debated Word Origins
The word "okay" has been the subject of extensive etymological debate, with one prominent theory attributing its origin to a jocular misspelling in 1830s Boston newspapers. In 1839, the Boston Morning Post published instances of "o.k." as an ironic abbreviation for "oll korrect," a deliberate phonetic rendering of "all correct" amid a fad for abbreviated misspellings among the city's literati.[41] This usage gained traction during the 1840 presidential campaign of Martin Van Buren, nicknamed "Old Kinderhook," which some early proponents linked to the initials O.K., though this connection is now viewed as coincidental reinforcement rather than the source.[42] Alternative theories have persisted, including derivations from Choctaw "okeh" (meaning "it is so") or Scottish "och aye," but these lack primary evidence from the period and are largely dismissed as folk etymologies. Linguist Allen Walker Read's seminal four-part series in American Speech (1963–1964) marshaled newspaper archives and contemporary accounts to solidify the "oll korrect" explanation, countering earlier speculative claims by tracing the term's rapid spread through print media. Similarly, "pumpernickel," denoting a dense Westphalian rye bread, features origins clouded by folk etymologies that suggest transcription or interpretive errors in early German-English lexicography. A persistent folk etymology from the Napoleonic era posits the name arose from a French officer's disdainful remark—"bon pour Nicol" or "pain pour Nicole," implying the bread was fit only for his horse Nicol—but this narrative postdates the word's first attestations by over a century and aligns with no verifiable historical record.[43] Scholarly analysis favors a genuine Low German dialectal compound: "pumpern" (to break wind, evoking the bread's reputed digestive effects) combined with "Nickel" (a diminutive of Nikolaus, connoting a goblin or rascal, akin to "Old Nick" for the devil).[44] Primary sources, including 17th-century German texts like those compiled by folklorist Kurt Ranke, document "Pumpernickel" first as a term of abuse for a clumsy or flatulent person before its application to bread around the Thirty Years' War.[44] Counterarguments highlighting phonetic variants (e.g., "pompernickel") in early dictionaries have been rebutted as dialectal adaptations rather than errors, with the Oxford English Dictionary affirming the borrowing from German in its 1738 entry without endorsing the French tale.[45] Current consensus, as reflected in OED updates, holds the compound as authentic, though the folk etymology endures in popular discourse.[45]Potential Ghost Words in Etymology
Ghost words exert a subtle yet significant influence on etymological research by infiltrating historical records and mimicking authentic linguistic developments, thereby complicating the reconstruction of word origins across language families. In broader patterns observed in medieval and early modern texts, scribal or printing errors can produce forms that appear to support spurious connections in Indo-European etymologies; for instance, the English word "abacot," entered in dictionaries as a royal headdress term, stemmed from a 16th-century misreading of "a bycoket" (a type of cap), which falsely suggested a distinct etymological path for headwear vocabulary unrelated to French "bicocquet." Similarly, in Celtic lexicography, ghost words like Irish cigire ("inspector"), arising from a misinterpretation of cíghim in older manuscripts, have perpetuated incorrect derivations, potentially echoing into wider Indo-European reconstructions by imitating expected phonetic shifts or semantic evolutions. These cases illustrate how isolated errors can propagate through copied sources, distorting the comparative method central to historical linguistics.[11][46] The methodological implications of undetected ghost words are profound, as they introduce unreliable data points that can skew analyses of sound laws, borrowing patterns, and semantic histories in historical linguistics. For example, a spurious form might be interpreted as evidence for a non-existent Proto-Indo-European root, leading to flawed cognate sets or overestimation of language contact. To counter this, post-2000 advancements in corpus linguistics provide essential tools for detection, including frequency profiling, collocation analysis, and anomaly detection across digitized historical texts, allowing researchers to flag low-attestation words lacking contextual support. The Oxford English Dictionary's 1933 "List of Spurious Words," comprising around 250 entries, has been refined through such digital verification in subsequent revisions, emphasizing the need for rigorous cross-referencing with primary manuscripts to avoid perpetuating errors.[11] Contemporary research in the 21st century leverages digital humanities to retroactively identify potential ghost words within vast old corpora, enhancing etymological accuracy through computational means. Studies such as the 2019 taxonomic analysis of Celtic ghost words employ digitized lexical databases and algorithmic pattern recognition to trace error origins, distinguishing inert "true ghosts" from reused "poltergeists" and deliberate fabrications. Projects like the Middle English Dictionary's online corpus facilitate similar retroactive scrutiny by enabling searchable access to medieval variants, uncovering misreads that once influenced broader linguistic reconstructions. These efforts underscore a shift toward interdisciplinary tools, integrating machine learning with philological expertise to refine etymological frameworks.[46]Distinctions from Similar Phenomena
Versus Back-Formation
Back-formation is a linguistic process in which a new word is created by removing an actual or supposed affix from an existing word, often resulting in a term with a clear semantic role, such as transforming a noun into a verb.[47] For instance, the verb "edit" was derived from the noun "editor" by eliminating the suffix "-or," yielding a productive form that means to perform the action associated with the original term.[47] In contrast to ghost words, which arise from errors like misprints or misunderstandings and lack any genuine semantic foundation or prior usage in language, back-formations are deliberate creations that fill a perceived gap in expression and often gain widespread acceptance.[47] Ghost words, by definition, enter dictionaries accidentally without meaningful etymology or application, whereas back-formations are intentional and semantically motivated, contributing to the evolution of vocabulary through analogy.[6] An example of this productivity is "beforemath," coined by removing "after-" from "aftermath" to denote preceding events, illustrating how back-formations can logically extend existing words despite initial novelty.[47] A direct comparison highlights these boundaries: the ghost word "dord," defined erroneously as a unit of density in a 1934 dictionary due to a typographical mistake in labeling "D or d" for density, had no semantic basis or usage and was later removed.[8] Conversely, "televise," back-formed from "television" in the early 20th century by subtracting the suffix "-ion," quickly became a standard verb meaning to broadcast via television, demonstrating intentional productivity and integration into everyday language.[47][48]Versus Other Linguistic Artifacts
Ghost words are distinguished from neologisms primarily by their accidental origins and lack of intended meaning, whereas neologisms represent deliberate linguistic innovations that evolve into accepted vocabulary. Neologisms often emerge from creative or scientific contexts to fill lexical gaps, such as the term "quark," which James Joyce invented as a playful, nonsensical utterance in his 1939 novel Finnegans Wake ("Three quarks for Muster Mark!") and was later intentionally repurposed by physicist Murray Gell-Mann in 1964 to denote subatomic particles, thereby entering standard scientific lexicon.[49] In contrast, ghost words like "dord"—a nonexistent entry in Webster's dictionary stemming from a 1931 typesetting error for "density"—carry no such purposeful semantic load and persist only through oversight until corrected.[2] Unlike folk etymologies, which arise from communal reinterpretations of real words through plausible but erroneous historical narratives, ghost words originate solely from isolated source errors without broader cultural adaptation. Folk etymology typically reshapes an existing term to align with familiar associations, as seen in the persistent myth about "red herring," where the phrase—derived from the reddish hue of smoked herring used metaphorically in 19th-century writing to denote a distraction—was falsely reimagined as originating from dragging the fish to mislead hunting dogs, a tale popularized despite lacking historical evidence.[50] Ghost words evade this process, remaining inert errors in reference works rather than evolving through popular usage; lexicographers like Walter Skeat, who coined "ghost words" in 1886, emphasized their disconnection from genuine linguistic development.[23] Eggcorns differ from ghost words in their basis as speaker-driven substitutions within familiar expressions, often gaining vernacular traction due to their intuitive logic, while ghost words remain unadopted artifacts of editorial mistakes. An eggcorn occurs when a phrase is reinterpreted phonetically and semantically in a way that seems reasonable, such as "old-timer's disease" for "Alzheimer's disease," evoking the condition's association with aging and appearing in everyday speech without correction.[51] Ghost words, however, do not achieve this level of acceptance; they are typically confined to dictionaries as anomalies, like "abacot", a 17th-century misprint for "bycocket" in early editions of Webster's, and are removed upon verification without influencing spoken language.[3] This distinction underscores a proposed taxonomic separation in lexicography, where ghost words are classified as pure fabrication errors apart from adaptive phenomena like eggcorns or folk etymologies.[46]References
- https://rationalwiki.org/wiki/Cdesign_proponentsists
- https://en.wiktionary.org/wiki/Wiktionary:Requests_for_verification/English
- https://en.wiktionary.org/wiki/benthoses
- https://en.wiktionary.org/wiki/teh
