Recent from talks
Nothing was collected or created yet.
Metaphone
View on WikipediaMetaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar. As with Soundex, similar-sounding words should share the same keys. Metaphone is available as a built-in operator in a number of systems.
Philips later produced a new version of the algorithm, which he named Double Metaphone. Contrary to the original algorithm whose application is limited to English only, this version takes into account spelling peculiarities of a number of other languages. In 2009 Philips released a third version, called Metaphone 3, which achieves an accuracy of approximately 99% for English words, non-English words familiar to Americans, and first names and family names commonly found in the United States, having been developed according to modern engineering standards against a test harness of prepared correct encodings.
Procedure
[edit]Original Metaphone codes use the 16 consonant symbols 0BFHJKLMNPRSTWXY.[2] The '0' represents "th" (as an ASCII approximation of Θ), 'X' represents "sh" or "ch", and the others represent their usual English pronunciations. The vowels AEIOU are also used, but only at the beginning of the code.[3] This table summarizes most of the rules in the original implementation:
- Drop duplicate adjacent letters, except for C.
- If the word begins with 'KN', 'GN', 'PN', 'AE', 'WR', drop the first letter.
- Drop 'B' if after 'M' at the end of the word.
- 'C' transforms to 'X' if followed by 'IA' or 'H' (unless in latter case, it is part of '-SCH-', in which case it transforms to 'K'). 'C' transforms to 'S' if followed by 'I', 'E', or 'Y'. Otherwise, 'C' transforms to 'K'.
- 'D' transforms to 'J' if followed by 'GE', 'GY', or 'GI'. Otherwise, 'D' transforms to 'T'.
- Drop 'G' if followed by 'H' and 'H' is not at the end or before a vowel. Drop 'G' if followed by 'N' or 'NED' and is at the end.
- 'G' transforms to 'J' if before 'I', 'E', or 'Y', and it is not in 'GG'. Otherwise, 'G' transforms to 'K'.
- Drop 'H' if after vowel and not before a vowel.
- 'CK' transforms to 'K'.
- 'PH' transforms to 'F'.
- 'Q' transforms to 'K'.
- 'S' transforms to 'X' if followed by 'H', 'IO', or 'IA'.
- 'T' transforms to 'X' if followed by 'IA' or 'IO'. 'TH' transforms to '0'. Drop 'T' if followed by 'CH'.
- 'V' transforms to 'F'.
- 'WH' transforms to 'W' if at the beginning. Drop 'W' if not followed by a vowel.
- 'X' transforms to 'S' if at the beginning. Otherwise, 'X' transforms to 'KS'.
- Drop 'Y' if not followed by a vowel.
- 'Z' transforms to 'S'.
- Drop all vowels unless it is the beginning.
This table does not constitute a complete description of the original Metaphone algorithm, and the algorithm cannot be coded correctly from it. Original Metaphone contained many errors and was superseded by Double Metaphone, and in turn Double Metaphone and original Metaphone were superseded by Metaphone 3, which corrects thousands of miscodings that will be produced by the first two versions.
To implement Metaphone without purchasing a (source code) copy of Metaphone 3, the reference implementation of Double Metaphone can be used.[4] Alternatively, version 2.1.3 of Metaphone 3, an earlier 2009 version without a number of encoding corrections made in the current version, version 2.5.4, has been made available under the terms of the BSD License via the OpenRefine project.[5]
Double Metaphone
[edit]The Double Metaphone phonetic encoding algorithm is the second generation of this algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal.[6] It makes a number of fundamental design improvements over the original Metaphone algorithm.
It is called "Double" because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name "Smith" yields a primary code of SM0 and a secondary code of XMT, while the name "Schmidt" yields a primary code of XMT and a secondary code of SMT—both have XMT in common.
Double Metaphone tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origins. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone.
Metaphone 3
[edit]A professional version was released in October 2009, developed by the same author, Lawrence Philips. It is a commercial product sold as source code. Metaphone 3 further improves phonetic encoding of words in the English language, non-English words familiar to Americans, and first names and family names commonly found in the United States. It improves encoding for proper names in particular to a considerable extent.[7] The author claims that in general it improves accuracy for all words from the approximately 89% of Double Metaphone to 98%. Developers can also now set switches in code to cause the algorithm to encode Metaphone keys 1) taking non-initial vowels into account, as well as 2) encoding voiced and unvoiced consonants differently. This allows the result set to be more closely focused if the developer finds that the search results include too many words that don't resemble the search term closely enough.[8] Metaphone 3 is sold as C++, Java, C#, PHP, Perl, and PL/SQL source, Ruby and Python wrappers accessing a Java jar, and also Metaphone 3 for Spanish and German pronunciation available as Java and C# source.[9] The latest revision of the Metaphone 3 algorithm is v2.5.4, released March 2015. The Metaphone3 Java source code for an earlier version, 2.1.3, lacking a large number of encoding corrections made in the current version, version 2.5.4, was included as part of the OpenRefine project and is publicly viewable.[10]
Common misconceptions
[edit]There are some misconceptions about the Metaphone algorithms that should be addressed. The following statements are true:
- All of them are designed to address regular, "dictionary" words, not just names, and
- Metaphone algorithms do not produce phonetic representations of the input words and names; rather, the output is an intentionally approximate phonetic representation, according to this standard:
- words that start with a vowel sound will have an 'A', representing any vowel, as the first character of the encoding (in Double Metaphone and Metaphone 3 - original Metaphone just preserves the actual vowel),
- vowels after an initial vowel sound will be disregarded and not encoded, and
- voiced/unvoiced consonant pairs will be mapped to the same encoding. (Examples of voiced/unvoiced consonant pairs are D/T, B/P, Z/S, G/K, etc.).
This approximate encoding is necessary to account for the way English speakers vary their pronunciations and misspell or otherwise vary words and names they are trying to spell. Vowels, of course, are notoriously highly variable. British speakers often complain that Americans seem to pronounce 'T's the same as 'D'. Consider, also, that all English speakers often pronounce 'Z' where 'S' is spelled, almost always when a noun ending in a voiced consonant or a liquid is pluralized, for example "seasons", "beams", "examples", etc. Not encoding vowels after an initial vowel sound will help to group words where a vowel and a consonant may be transposed in the misspelling or alternative pronunciation.
Metaphone of other languages
[edit]Metaphone is useful for English variants and other languages, having been preferred to Soundex in several Indo-European languages. On the other hand, rough phonetic encoding causes language dependency — or, in a language variant, average language-speaker dependency — mainly for non-English variants.
Perhaps the first example of stable adaptation of non-English metaphone was Brazilian Portuguese: it originated in ~2008 as a database solution in Várzea Paulista municipality of Brazil, and it evolved to the current metaphone-ptbr algorithm.
See also
[edit]References
[edit]- ^ Hanging on the Metaphone, Lawrence Philips. Computer Language, Vol. 7, No. 12 (December), 1990.
- ^ "Alternative to Soundex". www.sound-ex.com. Archived from the original on 6 March 2014. Retrieved 16 May 2018.
- ^ "Morfoedro - Technology". www.morfoedro.it. Retrieved 16 May 2018.
- ^ Philips, Lawrence (1999) [1998]. "Double Metaphone" (CPP). GNU Aspell. Retrieved February 23, 2024.
- ^ "OpenRefine". GitHub. 19 May 2022.
- ^ Philips, Lawrence (June 2000). "The double metaphone search algorithm". C/C++ Users Journal. 18 (6): 38–43.
- ^ Guy, Ido; Ur, Sigalit; Ronen, Inbal; Weber, Sara; Oral, Tolga (2012). "Best Faces Forward: A Large-scale Study of People Search in the Enterprise" (PDF). Archived from the original (PDF) on December 1, 2023. Retrieved February 23, 2024.
- ^ Atkinson, Kevin. "Lawrence Philips' Metaphone Algorithm". aspell.net. Retrieved 16 May 2018.
- ^ "Anthropomorphic Software". www.amorphics.com. Retrieved 16 May 2018.
- ^ "OpenRefine source for Metaphone3". github.com. Retrieved 2 Nov 2020.
External links
[edit]- The Double Metaphone Search Algorithm, By Lawrence Phillips, June 1, 2000, Dr Dobb's, Original article
Metaphone algorithms for other languages
[edit]- Brazilian Portuguese in C Metaphone for Brazilian Portuguese, in C with PHP and PostgreSQL port.
- Brazilian Portuguese in Java Metaphone for Brazilian Portuguese, in Java.
- Spanish Metaphone in Python
- Double Metaphone algorithm for Bangla
- Double Metaphone algorithm for Amharic
- Russian Metaphone in Ruby.
- Double Metaphone and Metaphone in JavaScript
Metaphone
View on Grokipediametaphone() function and SQL extensions.[1][3]
Overview and History
Development and Origins
The Metaphone algorithm was invented by Lawrence Philips in 1990 as an enhancement to the Soundex system for phonetic encoding, specifically aimed at improving database indexing by better approximating English pronunciation patterns.[1] Designed to address shortcomings in Soundex, such as its failure to account for vowel sounds and silent letters, which often led to imprecise matches for variant spellings of names, Metaphone introduced rules that grouped consonants into 16 phonetic classes while preserving more nuanced sound representations.[7] Philips first detailed the algorithm in his article "Hanging on the Metaphone," published in the December 1990 issue of Computer Language magazine.[1] This publication marked the algorithm's introduction to the computing community, emphasizing its utility for applications requiring robust phonetic similarity detection, including information retrieval systems.[8] Early adoption of Metaphone occurred in genealogy software, where its improved accuracy for matching similar-sounding surnames proved valuable; for instance, tools like Ancestry Family Tree integrated it alongside Soundex to facilitate searches across variant name forms in historical records.[9]Purpose and Applications
The primary goal of the Metaphone algorithm is to produce a phonetic key, generally limited to four characters, that approximates the pronunciation of an English word, thereby enabling fuzzy matching between terms that sound similar but differ in spelling, such as due to errors, variations, or transliterations.[1] This approach addresses limitations of exact string matching by grouping phonetically equivalent words under the same key, which supports more robust searches in noisy or inconsistent datasets.[3] For instance, names like "Smith" and "Smyth" are mapped to the same key, illustrating how it accommodates common orthographic inconsistencies without requiring precise character alignment.[3] Metaphone finds key applications in spell-checking features within word processors and text editors, where it helps identify and suggest corrections for misspellings based on phonetic similarity rather than literal matches.[10] In genealogy databases, it enhances name matching by linking variant surnames or first names across historical records, facilitating the construction of family trees from diverse sources like census data or immigration logs.[3] Search engines leverage it for approximate string matching, improving retrieval accuracy for user queries with phonetic variations, such as in voice-to-text inputs or multilingual transliterations.[11] Additionally, in customer relationship management (CRM) systems, Metaphone aids data deduplication by identifying duplicate entries from phonetic hashes, reducing redundancy in contact lists and improving data quality.[12][13] Over exact matching, Metaphone offers advantages in handling regional dialects, non-standard transliterations from other languages into English, and frequent spelling errors, leading to higher recall in applications like information retrieval.[1] Real-world implementations include PHP's built-inmetaphone() function, introduced in PHP 4.0 in 2000, which computes keys for database queries and user validation.[3] In Python, libraries such as the jellyfish package integrate Metaphone for fuzzy operations in data processing pipelines, supporting tasks from text analysis to record linkage since the 2010s, such as the jellyfish package first released in 2012.[14][15]
Original Metaphone Algorithm
Core Procedure
The core procedure of the original Metaphone algorithm transforms an input string into a phonetic key by preprocessing the text and then applying a series of conditional rules to encode English pronunciation patterns, producing a 4-character code that groups similar-sounding words. Developed by Lawrence Philips, this process prioritizes consonant sounds while suppressing vowels and silent letters based on contextual rules.[1] Preprocessing begins with converting the entire input string to uppercase to standardize letter cases, followed by removing all non-alphabetic characters to focus solely on letters relevant to pronunciation. Special handling occurs for initial letter combinations that represent silent prefixes in English: if the string starts with "PN", "KN", "GN", "AE", or "WR", the first letter is dropped (e.g., "pneumonia" becomes "NEUMONIA" and "knight" becomes "NIGHT"). Additionally, an initial "X" is replaced with "S" (as in "xylophone" sounding like "zylophone"), and "WH" is simplified to "W". These steps ensure the string enters the main transformation phase in a normalized form suitable for rule application.[16] The main processing loop then iterates sequentially through the characters of the preprocessed string, applying approximately 28 primary conditional rules to map letter sequences to one of 16 phonetic codes representing core consonant sounds (B, F, K, J, L, M, N, P, R, S, T, TH as "0", CH/SH as "X", etc.). Vowels (A, E, I, O, U, Y) are ignored unless they appear at the start of the string after preprocessing, in which case the first vowel is retained to capture initial vowel sounds. Rules are evaluated in a fixed order, checking the current character along with preceding and following characters for context; for example, "B" is encoded as "B" unless it follows "M" at the end of the string (as in "dumb"), in which case it is silent and skipped, while "PH" is always transformed to "F" (as in "phone"). Duplicate adjacent phonemes are skipped to condense the output, preventing redundancy like repeated "S" sounds. The loop continues until the output key reaches 4 characters or the end of the string is reached.[16] The output is a phonetic key limited to 4 characters, which serves as an index for matching words with similar pronunciations; if the resulting code is shorter than 4 characters, it is used as is, though implementations may pad it for consistency. This fixed length balances detail and efficiency for applications like database indexing. The core procedure laid the foundation for extensions such as Double Metaphone, which refines handling of ambiguous cases.[1] Here is a simplified pseudocode representation of the core procedure:function OriginalMetaphone(input):
// Preprocessing
input = toUpperCase(input)
input = removeNonAlphabetic(input)
if input starts with "PN", "KN", "GN", "AE", or "WR":
input = input.substring(1)
if input starts with "X":
input = "S" + input.substring(1)
if input starts with "WH":
input = "W" + input.substring(2)
key = ""
i = 0
length = input.length
lastKeyChar = "" // To skip duplicates
while length(key) < 4 and i < length:
current = input.charAt(i)
nextChar = if i+1 < length then input.charAt(i+1) else ""
prevChar = if i-1 >= 0 then input.charAt(i-1) else ""
// Skip vowels unless at start
if isVowel(current) and i > 0:
i += 1
continue
// Apply 28 rules (simplified examples; full rules in order)
phoneme = ""
if current == "B" and not (prevChar == "M" and i == length-1):
phoneme = "B"
else if current == "C" and (nextChar == "H" or nextChar == "I" or nextChar == "A"):
phoneme = "X" // Or "K/S" based on context
else if current == "D" and nextChar in ["G", "G", "I"]:
phoneme = "J"
// ... (additional 25 rules for F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Z, etc.)
else if current == "PH":
phoneme = "F"
i += 1 // Skip next char
// Handle silent letters, doubles, etc.
if phoneme != "" and phoneme != lastKeyChar:
key += phoneme
lastKeyChar = phoneme
i += 1
return key.substring(0, 4)
function OriginalMetaphone(input):
// Preprocessing
input = toUpperCase(input)
input = removeNonAlphabetic(input)
if input starts with "PN", "KN", "GN", "AE", or "WR":
input = input.substring(1)
if input starts with "X":
input = "S" + input.substring(1)
if input starts with "WH":
input = "W" + input.substring(2)
key = ""
i = 0
length = input.length
lastKeyChar = "" // To skip duplicates
while length(key) < 4 and i < length:
current = input.charAt(i)
nextChar = if i+1 < length then input.charAt(i+1) else ""
prevChar = if i-1 >= 0 then input.charAt(i-1) else ""
// Skip vowels unless at start
if isVowel(current) and i > 0:
i += 1
continue
// Apply 28 rules (simplified examples; full rules in order)
phoneme = ""
if current == "B" and not (prevChar == "M" and i == length-1):
phoneme = "B"
else if current == "C" and (nextChar == "H" or nextChar == "I" or nextChar == "A"):
phoneme = "X" // Or "K/S" based on context
else if current == "D" and nextChar in ["G", "G", "I"]:
phoneme = "J"
// ... (additional 25 rules for F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Z, etc.)
else if current == "PH":
phoneme = "F"
i += 1 // Skip next char
// Handle silent letters, doubles, etc.
if phoneme != "" and phoneme != lastKeyChar:
key += phoneme
lastKeyChar = phoneme
i += 1
return key.substring(0, 4)
Key Rules and Transformations
The original Metaphone algorithm defines a comprehensive set of phonetic transformation rules to convert English words into keys representing 16 primary consonant sounds: B, X (for CH, SH, soft C/G), J, K, L, M, N, P, R, F, 0 (for TH), T, V, W, Y, and S (with Z mapping to S). These rules, numbering approximately 28 in total, are applied sequentially after preprocessing the input to uppercase and removing non-alphabetic characters, focusing on consonants while handling vowels and special digraphs based on position and neighboring letters.[17][18] Vowels (A, E, I, O, U, and sometimes Y) are generally ignored throughout the word to emphasize phonetic similarity through consonants, except when the word begins with a vowel, in which case the first vowel letter is retained to preserve the initial sound cue. This approach ensures that words like "apple" and "aple" yield similar keys without vowel interference.[17] The rules are categorized primarily by the target letter or digraph, with conditions dictated by position (initial, medial, final), preceding or following characters, and exceptions for silent letters or alternate pronunciations. Below is a categorized overview of the key transformations:- B: Retained as 'B' unless at the end of the word following 'M' (as in "dumb"), where it is silent and omitted.[18]
- C:
- Maps to 'S' if followed by 'E', 'I', or 'Y' (soft C, as in "city").
- Maps to 'X' if followed by 'H' (as in "church") or 'IA' (as in "special"), unless preceded by 'S'.
- Maps to 'K' otherwise (hard C, as in "cat"), but silent if in "SCI", "SCE", or "SCY".[17][18]
- D: Retained as 'T' generally, but maps to 'J' if followed by 'GE', 'GI', or 'GY' (as in "judge").[18]
- F: Retained as 'F', with 'PH' also mapping to 'F' (as in "phone").[17]
- G:
- Silent if followed by 'H' at the end or before a consonant, or in combinations like "GN" or "GNED" (as in "sign").
- Maps to 'J' if followed by 'E', 'I', or 'Y' and not immediately after 'G' (as in "magic").
- Retained as 'K' otherwise, unless in initial "GN" (silent G).[18]
- H: Retained only if preceded by a vowel and followed by a vowel, or at the start; otherwise silent (e.g., silent after 'C', 'S', 'P', 'T', 'G' as in "ghost"). Initial "WH" simplifies to 'W'.[17][18]
- J: Retained as 'J'.[17]
- K: Retained as 'K', but silent if immediately after 'C' (as in "acknowledge").[18]
- L: Retained as 'L', with doubled 'L' reduced by dropping the second instance.[17]
- M: Retained as 'M', with doubled 'M' reduced.[18]
- N: Retained as 'N', with doubled 'N' reduced; silent after 'G' in certain cases like "gnaw".[17]
- P: Retained as 'P', but maps to 'F' if followed by 'H' (as in "philosophy"); silent in "PNEU" or initial "PN".[18]
- Q: Always maps to 'K' (as in "queen").[17]
- R: Retained as 'R', with doubled 'R' reduced.[18]
- S: Retained as 'S', but maps to 'X' if followed by 'H', 'IA', or 'IO' (as in "session").[17]
- T:
- Retained as 'T' generally, but maps to '0' (TH sound) if followed by 'H' (as in "thin").
- Maps to 'X' if followed by 'IA' or 'IO' (as in "nation").
- Silent in "TCH" (as in "watch").[18]
- V: Maps to 'F' (as in "victory").[17]
- W: Retained as 'W' only if at the start and followed by a vowel (as in "water"); otherwise dropped. Initial "WR" simplifies to 'R'.[18]
- X: Maps to 'KS' (as in "exit"), but initial 'X' may simplify to 'S' (as in "xylophone").[17]
- Y: Retained as 'Y' only if followed by a vowel (as in "yet"); otherwise treated as a vowel and dropped.[18]
- Z: Maps to 'S' (as in "zoo").[17]
Double Metaphone
Key Improvements
Double Metaphone was developed by Lawrence Philips in 2000 and published in the June 2000 issue of C/C++ Users Journal, specifically to overcome limitations in the original Metaphone algorithm, particularly its inadequate handling of names from diverse ethnic backgrounds such as Slavic, French, and Greek origins.[19] The algorithm introduces a dual-key output system, producing both a primary key that closely aligns with the original Metaphone encoding and a secondary key to capture alternate phonetic interpretations, thereby accommodating ambiguities in pronunciation.[19] For instance, in processing names like "Nguyen," the primary key is "NJN" (treating "NG" as "NJ") and the secondary key is "NKN" (treating "NG" as "NK"), enabling better matching for non-English transliterations.[20] The enhancements include an expanded set of transformation rules for better support of ethnic name variations using the English alphabet, refined handling of digraphs such as "CH" across linguistic contexts (e.g., as 'X' in English, 'K' in German, or 'SH' in Irish influences).[19] These changes result in substantially higher accuracy compared to the original algorithm for common names across diverse ethnic datasets.[19] Additionally, the design ensures backward compatibility, allowing the primary key to derive the original Metaphone codes without loss of functionality for legacy applications.[19]Extended Procedure and Rules
The Double Metaphone algorithm introduces an extended procedure that processes the input string to generate two phonetic keys—a primary and a secondary—simultaneously, allowing for better handling of pronunciation ambiguities common in English and ethnic names. Preprocessing begins by converting the string to uppercase and removing non-alphabetic characters, including apostrophes and hyphens, to normalize variations like "O'Connor" or "Jean-Paul". Additionally, initial letter combinations receive special treatment: for instance, words starting with "WR" are processed by skipping the "W" and treating it as "R", while similar silent starts like "KN", "GN", or "PN" skip the initial consonant. This step ensures consistent entry into the main encoding loop, which iterates through the cleaned string using an index that advances variably based on rule conditions.[1] The core procedure employs parallel key construction via a dual-result mechanism, where phonetic codes are appended to the primary key by default and to the secondary only when an alternative pronunciation is detected, preventing unnecessary divergence. The loop examines the current character and up to four surrounding characters to apply context-sensitive transformations, branching on ambiguities such as "CH": if not in a Slavic or Greek context (e.g., "Chomsky"), it appends 'X' to the primary and 'K' to the secondary; otherwise, both receive 'K'. Other branching cases include "CK" as 'K' in both, or "SCH" as 'SK' primary and potentially 'X' secondary under specific vowel conditions. This dual approach contrasts with the original Metaphone's single-key output by building keys in tandem, appending up to four characters each and stopping early if complete, with the secondary left blank if no alternatives arise. The process incorporates approximately 40 rules covering consonants and vowel handling, with representative examples including: "B" after "MB" (e.g., "dumb") silenced in the secondary key while 'P' in primary; "D" before "G" (e.g., "edge") encoded as 'J' if followed by a vowel like "I", "E", or "Y"; "G" before "I", "E", or "Y" as 'J' unless in exceptions like "Hugo"; "PH" as 'F' in both; "S" before "H" and vowels as 'X'; and "T" in "TION" as 'X'. Ethnic-specific rules address variations, such as Irish "Mac" or "Mc" prefixes (e.g., "MacGregor" treated to avoid over-silencing the "G" and match "McGregor" variants).[1] Pseudocode for the extended procedure reflects these differences through a structured loop with conditional appends:function doubleMetaphone(input):
input = uppercase(removeNonLetters(input)) // Preprocessing
if input starts with silent combo (e.g., "WR"): index = 1 else index = 0
primary = "" ; secondary = ""
while length(primary) < 4 and index < length(input):
current = input[index]
if current in [vowel](/page/Vowel)s: append nothing; index += 1 // Skip vowels
else if current == "B":
append "P" to both; index += (input[index+1] == "B" ? 2 : 1)
else if current == "C": // ~100 sub-conditions, e.g.,
if "CH" and not conditionCH0: append "K" to both
else if "CH" and conditionCH0: append "X" primary, "K" secondary
// ... other C rules
index += 1 or 2
// Similar switch/if for D, G (e.g., "DG" + [vowel](/page/Vowel): "J" both), etc.
// For "MB#": append "M" primary, nothing secondary; index += 2
// Ethnic: if "MAC" + "G": special G handling
pad keys to 4 chars if shorter
return (primary, secondary if different else primary)
function doubleMetaphone(input):
input = uppercase(removeNonLetters(input)) // Preprocessing
if input starts with silent combo (e.g., "WR"): index = 1 else index = 0
primary = "" ; secondary = ""
while length(primary) < 4 and index < length(input):
current = input[index]
if current in [vowel](/page/Vowel)s: append nothing; index += 1 // Skip vowels
else if current == "B":
append "P" to both; index += (input[index+1] == "B" ? 2 : 1)
else if current == "C": // ~100 sub-conditions, e.g.,
if "CH" and not conditionCH0: append "K" to both
else if "CH" and conditionCH0: append "X" primary, "K" secondary
// ... other C rules
index += 1 or 2
// Similar switch/if for D, G (e.g., "DG" + [vowel](/page/Vowel): "J" both), etc.
// For "MB#": append "M" primary, nothing secondary; index += 2
// Ethnic: if "MAC" + "G": special G handling
pad keys to 4 chars if shorter
return (primary, secondary if different else primary)
