Hubbry Logo
Center embeddingCenter embeddingMain
Open search
Center embedding
Community hub
Center embedding
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Center embedding
Center embedding
from Wikipedia

In linguistics, center embedding is the process of embedding a phrase in the middle of another phrase of the same type. This often leads to difficulty with parsing which would be difficult to explain on grammatical grounds alone. The most frequently used example involves embedding a relative clause inside another one as in:

A man that a woman loves
A man that a woman that a child knows loves
A man that a woman that a child that a bird saw knows loves
A man that a woman that a child that a bird that I heard saw knows loves

In theories of natural language parsing, the difficulty with multiple center embedding is thought to arise from limitations of the human short term memory. In order to process multiple center embeddings, we have to store many subjects in order to connect them to their predicates.

An interesting theoretical point is that sentences with multiple center embedding are grammatical, but unacceptable. Such examples are behind Noam Chomsky's comment that, "Languages are not 'designed for parsability' … we may say that languages, as such, are not usable."[citation needed]

Some researchers (such as Peter Reich) came up with theories that though single center embedding is acceptable (as in "the man that boy kicked is a friend of mine"), double center embedding is not. The linguist Anne De Roeck and colleagues provided a counter-example: "Isn't it true that example-sentences that people that you know produce are more likely to be accepted?" (De Roeck et al., 1982).

The linguist Fred Karlsson provided empirical evidence in 2007 that the maximal degree of multiple center-embedding of clauses is exactly 3 in written language. He provided thirteen genuine examples of this type from various Indo-European languages (Danish, English, German, Latin, Swedish). No real examples of degree 4 have been recorded. In spoken language, multiple center-embeddings even of degree 2 are so rare as to be practically non-existing.[1]

Center embedding is the focus of a science fiction novel, Ian Watson's The Embedding, and plays a part in Ted Chiang's Story of Your Life.

Background

[edit]

Embedding on its own refers to clauses occurring as subordinate parts of a superordinate clause. There are three types of subclauses: complement, relative, and adverbial. Subordinators or relative pronouns indicate which type of subclause is being used. A center embedding occurs when words in a superordinate clause occur on both the left and the right of a subclause. Iterated center embedding of the same type of clause is called self-embedding.

Examples

[edit]

English

[edit]

The following occurred in a 1917 science fiction story with nothing in the context referring to linguistic constructs:[2]

  • The community of which the green Martians with whom my lot was cast formed a part was composed of some thirty thousand souls.

Japanese

[edit]

Japanese allows a singly nested clause, but an additional nesting makes a sentence unprocessable.[3] Example from,[4] section 13.4.

兄が

older.brother-NOM

妹を

younger.sister-ACC

いじめた

bullied

兄が 妹を いじめた

older.brother-NOM younger.sister-ACC bullied

"My older brother bullied my younger sister."

ベビーシッターは

babysitter-TOP

兄が

older.brother-NOM

妹を

younger.sister-ACC

いじめた

bullied

that

言った

said

ベビーシッターは 兄が 妹を いじめた と 言った

babysitter-TOP older.brother-NOM younger.sister-ACC bullied that said

The babysitter said that my older brother bullied my younger sister.

The following sentence is unprocessable:

おばさんは

aunt-TOP

ベビーシッターが

babysitter-NOM

兄が

older.brother-NOM

妹を

younger.sister-ACC

いじめた

bullied

that

言った

said

that

思っている

thinks

おばさんは ベビーシッターが 兄が 妹を いじめた と 言った と 思っている

aunt-TOP babysitter-NOM older.brother-NOM younger.sister-ACC bullied that said that thinks

My aunt thinks that the babysitter said that my older brother bullied my younger sister.

Effective and ineffective embedding

[edit]

Embedding can be used when two clauses share a common category and can expand a sentence. It is not effective when optional categories are used to create extensive embedding in a sentence.

Example of effective embedding

[edit]
  • My brother opened the window. The maid had closed it.

The common category is the window. So this sentence can be expanded to become:

  • My brother opened the window the maid had closed.

Example of ineffective embedding

[edit]
  • My brother opened the window the maid the janitor Uncle Bill had hired had married had closed.

There is no common category for this sentence. So it should be broken up into multiple sentences to make sense to the reader:

  • My brother opened the window the maid had closed. She was the one who had married the janitor Uncle Bill had hired.

A center embedded sentence is difficult to comprehend when a relative clause is embedded in another relative clause. Comprehension becomes easier when the types of clause are different – when a complement clause is embedded in a relative clause or when a relative clause is embedded in a complement clause. For example:

  • The man who heard that the dog had been killed on the radio ran away.

One can tell if a sentence is center embedded or edge embedded depending on where the brackets are located in the sentence.

  1. [Joe believes [Mary thinks [John is handsome.]]]
  2. The cat [that the dog [that the man hit] chased] meowed.

In sentence (1), all of the brackets are located on the right, so this sentence is right-embedded. In sentence (2), the brackets are located inside the sentence spaced throughout.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Center embedding is a syntactic in characterized by the insertion of a subordinate or within the structure of a superordinate , resulting in nested dependencies that interrupt the main 's linear order. This construction exemplifies the recursive capacity of human syntax, allowing for the generation of increasingly complex sentences by elements inside one another, as seen in examples like "The the dog chased ran away," where the "the dog chased" is centered within the main . Unlike simpler appositions or tail embeddings, center embedding places the interrupting material between the subject and predicate of the host , which distinguishes it as a core feature of clause-internal unique to human language faculties. In terms of syntactic theory, center embedding has been pivotal in discussions of since the mid-20th century, highlighting how formal rules can produce unbounded hierarchies of embedding while human processing imposes practical limits. Single instances are typically comprehensible, but multiple or double center embeddings—such as "The rat the cat the dog scared ate died"—escalate cognitive demands, often leading to parsing errors or judgments of unacceptability due to constraints that hinder tracking of dependencies across interruptions. These difficulties arise from the need to maintain multiple unresolved syntactic predictions simultaneously, a challenge quantified in psycholinguistic experiments showing increased reading times and error rates with embedding depth. Cross-linguistically, the prevalence and processability of center embedding vary based on word order and morphological features; for instance, subject-object-verb (SOV) languages like Japanese and Korean facilitate it more readily through postposed clauses and case marking, whereas subject-verb-object (SVO) languages like English exhibit greater hurdles, influenced by factors such as prosody, specificity, and exposure frequency. Evolutionary perspectives suggest that center embedding may stem from pre-linguistic cognitive abilities for , as evidenced by limited in systems, but its full exploitation in syntax reflects a uniquely adaptation shaped by communicative pressures. Recent studies also explore illusions in processing, where omitted elements in double embeddings can paradoxically enhance perceived grammaticality in some languages, underscoring the interplay between competence and in language comprehension.

Linguistic Foundations

Definition

In , center embedding refers to the syntactic process of inserting a or of the same type—such as a —into the middle of another or of the same type, thereby creating nested dependencies that interrupt the host structure's linear . This contrasts with linear embedding, where subordinate or are attached peripherally, either at the beginning (left-branching) or end (right-branching) of the host, preserving a more sequential dependency flow without central interruption. A simple single-level center embedding can be represented structurally as follows, using a basic phrase structure for a modifying a within a sentence (S):

S / \ NP VP /|\ Det N RC | /|\ N wh V NP | | \ V Det N

S / \ NP VP /|\ Det N RC | /|\ N wh V NP | | \ V Det N

Here, the (RC) is embedded centrally within the (NP), with the head noun (N) of the main NP intervening between the (Det) and the RC, and the RC itself containing its own internal structure (e.g., wh-word, V, and embedded NP). This configuration exemplifies how center embedding generates nested dependencies that interrupt the linear order, distinguishing it from purely sequential, non-interrupting arrangements. The concept of center embedding emerged in linguistic literature during the early , building on earlier analyses of syntactic depth limits. Victor H. Yngve first explored related ideas of self-embedding and structural depth in language models in 1960, proposing hypotheses on how regression in branching affects and processing. and George A. Miller formalized and expanded the discussion of center embedding in their chapter, examining its implications for and human performance limitations in handling nested structures. Center embedding exemplifies , where rules apply iteratively to generate complexity, though its central placement often amplifies cognitive demands compared to peripheral forms.

Relation to Recursion and Other Embeddings

Recursion in refers to the property by which a linguistic unit, such as a or , can be embedded within a larger unit of the same type, allowing the generation of hierarchically complex structures from a of rules. This mechanism enables languages to produce sentences of theoretically unbounded length and depth, a foundational concept in . Center embedding exemplifies this through the central nesting of a constituent within its host, creating layered dependencies that test the expressive power of syntactic rules. Center embedding is predominantly a form of self-embedding, where the embedded element shares the same as the embedding structure, such as a within another . This contrasts with cross-embedding, often manifested as cross-serial dependencies, where relations between elements of potentially different types cross rather than nest, as seen in certain verb cluster constructions. Self-embedding via center embedding supports hierarchical phrase structure, while cross-embedding challenges strict nesting and appears in languages exhibiting milder context-sensitivity. Within Chomsky's hierarchy of formal grammars, center embedding highlights the limitations of regular grammars (Type-3), which cannot generate unbounded nested dependencies without linear approximations, necessitating context-free grammars (Type-2) that incorporate recursive . These structures thus probe the formal adequacy of generative models, confirming that natural languages exceed finite-state mechanisms and require rules permitting self-similar embedding. The linkage between center embedding and emerged in the 1950s and 1960s amid the rise of , where early observations tied it to recursive operations in phrase structure. Chomsky and Miller (1963) analyzed how such embeddings arise from formal rules, establishing their theoretical viability while distinguishing competence from constraints in multiple layers.

Examples Across Languages

English

Center embedding in English typically involves nesting s within one another, creating structures where modifiers interrupt the head and its main predicate. This construction is grammatical but increases in complexity with each level of nesting. A single-level example is "The the cat chased died," in which the "the cat chased" (omitting the optional "that") modifies the head "rat," delaying the main verb "died" until after the embedded . At the double level, the nesting deepens, as seen in "The rat the cat the dog chased killed died," where the innermost "the dog chased" modifies "cat," which is itself part of the outer relative clause modifying "rat," further postponing the main verb. A variant from the same tradition is "The rat the cat the dog chased killed ate the malt," illustrating how the chain of verbs resolves the dependencies from the center outward. For triple-level embedding, an example is "The rat the cat the dog the horse scared chased killed died," extending the pattern with an additional inner "the horse scared" modifying "dog." However, such constructions often result in ambiguity or complete breakdown, as the multiple unresolved dependencies overwhelm , rendering the sentence incomprehensible despite its grammaticality. Syntactically, these structures rely on relative clauses as postnominal modifiers in noun phrases, creating head-modifier relations that build recursively inward. In phrase structure terms, the simple example parses as a sentence (S) with a subject noun phrase (NP: "the rat" modified by relative clause RC: S[NP "the cat" VP "chased" (with gap linked to "rat")]) followed by the main VP "died." The double level adds another embedded RC within the first RC's subject NP, forming NP[head "rat" RC[S[NP[head "cat" RC[S[NP "the dog" VP "chased" (gap to cat)]] VP "killed" (gap to rat)]]] VP "died." This successive embedding of subject RCs delays resolution of the head until the outermost clause, straining linear processing. A common pitfall of center embedding in English is the emergence of garden-path effects, where the parser commits to an incorrect syntactic attachment early on, necessitating costly reanalysis. For instance, in "The the the dog chased...," initial words may prompt a misparse treating "rat the cat" as a direct object or conjoined subject, only for "chased" to force reattachment as a verb; deeper embeddings amplify this ambiguity by stacking unresolved gaps.

Japanese and Other Languages

In Japanese, a head-final subject-object-verb (SOV) , relative clauses precede the noun they modify, resulting in left-branching structures that facilitate multiple embeddings more readily than the center-embedded constructions common in subject-verb-object (SVO) languages like English. A representative example is Inu-ga neko-o oikaketa neko-ga shinda ("The that the chased died"), where the relative clause Inu-ga neko-o oikaketa ("the chased") nests to the left of the head noun neko (""). This prenominal positioning reduces load during processing, as dependencies resolve incrementally from left to right, allowing speakers to handle multiple levels of embedding—typically 3-4—without the rapid comprehensibility drop seen in English center embeddings beyond two levels. In contrast to English's stricter limits on depth due to intervening material in center-embedded relative clauses, Japanese's structure supports deeper nesting in natural discourse, though repeated nominative markers (-ga) in multiply embedded sentences can still introduce processing challenges by impairing discriminability. German, with its verb-final order in subordinate clauses, also mitigates some difficulties of center embedding compared to English, as the delayed verb position enables sustained predictions across embedded material. For instance, in a doubly embedded relative clause like Der Mann, der die Frau, die den Hund fütterte, sah, grüßte, ("The man who saw the woman who fed the dog greeted"), the final verbs cluster at the end, aiding dependency resolution through familiarity with such patterns. The extinct Coahuilteco language of , an SOV language with postnominal relative clauses, exemplifies center embedding in a non-Indo-European context, with historical texts attesting two levels, such as nested modifiers between a and . Deeper embedding appears avoided through extraposition of relative clauses, aligning with typological observations that languages limit center embedding depth to manage processing demands. Typologically, center embedding's feasibility varies with : SVO languages with postnominal relatives (like English and German subordinates) promote center embedding, increasing cognitive strain at deeper levels, whereas SOV languages with prenominal relatives (like Japanese) favor left-branching, enabling easier multi-level nesting.

Processing and Comprehension

Cognitive Challenges

Center embedding imposes significant cognitive demands primarily through its interruption of syntactic constituents, requiring language processors to maintain incomplete phrases in until their resolution. This process exceeds the typical capacity limits of , often described by George A. Miller's seminal finding that humans can hold approximately seven plus or minus two chunks of information at once. In syntactic contexts, center embedding extends this limitation by necessitating the storage of multiple unresolved dependencies, such as interrupted by embedded clauses, leading to heightened memory load as each embedding level adds to the backlog of pending integrations. For instance, in an English sentence like "The man the woman saw left," the initial "the man" must be held active while processing the embedded clause, illustrating how even single embeddings strain short-term retention. Parsing models further elucidate these challenges through activation-based accounts, where comprehension relies on retrieving and resolving dependencies amid interference from similar intervening elements. In such frameworks, central gaps in center-embedded structures cause similarity-based interference, as the parser activates competing candidates during dependency resolution, slowing retrieval and increasing error rates. The Dependency Locality Theory complements this by quantifying difficulty via integration costs—effort to link heads and dependents—and storage costs for maintaining unfinished phrases, both of which escalate with deeper embeddings due to longer dependency spans. The difficulty scales with embedding depth: single center embeddings are generally comprehensible with minimal disruption, as they involve only one interrupted constituent; double embeddings introduce substantial challenges by doubling the memory and interference demands, often resulting in slower ; and triple or higher embeddings typically render sentences incomprehensible for humans, as they overwhelm capacity and lead to parsing failures. Neurologically, these demands correlate with increased activation in , a region in the left inferior frontal gyrus associated with syntactic and hierarchical processing. studies demonstrate that nested structures, including center embeddings, elicit greater hemodynamic responses in compared to non-nested sentences, reflecting the neural cost of maintaining and integrating multiple levels of syntactic hierarchy.

Experimental Evidence

The Dependency Locality Theory (DLT), introduced by Gibson in , posits that comprehension difficulty arises from the of maintaining long-distance syntactic dependencies, with storage and integration costs increasing as dependencies grow longer. Studies applying DLT, including self-paced reading tasks with English sentences featuring nested relative clauses (e.g., King & Just, ), have shown significantly higher error rates and slower reading times for double and triple center embeddings compared to single embeddings or right-branching structures, with error rates approaching 100% for triple embeddings in some cases. Cross-linguistic experiments have revealed variations in handling center embeddings, particularly highlighting advantages in head-final languages like Japanese. Early studies in the and , including eye-tracking work by Mazuka, examined how Japanese speakers process center-embedded relative clauses. These found that Japanese participants could comprehend multiple center embeddings, including triples, with lower error rates than English speakers under similar conditions, attributed to the language's verb-final structure, which shortens forward-looking dependencies and allows prosodic cues to aid parsing. More recent empirical work has further validated these limits using advanced behavioral paradigms. A 2020 study on Korean used sentences to isolate syntactic processing of relative clauses and center embeddings, revealing through self-paced reading and accuracy measures that center embedding imposes greater demands than simple relative clauses, with comprehension accuracy around 63% for double embeddings. Complementing this, 2022 research involving MIT investigators analyzed comprehension in natural texts, including legal documents laden with center embeddings; results showed reduced comprehension accuracy (about 68%) and recall (about 35%) for such texts compared to simpler registers (74% and 42%, respectively), underscoring limitations across diverse materials. Developmental studies indicate that children exhibit stricter limits on center embedding than adults, with single embeddings typically mastered around age 5 and multiples posing greater challenges into early years.

Theoretical and Practical Implications

Syntactic Theory

In the , observations of center embedding emerged within the framework of , where it served as a key illustration of the recursive properties distinguishing human from finite-state models. Noam Chomsky's early work emphasized as a core mechanism for generating unbounded hierarchical structures, with center embedding exemplifying how transformations could produce nested clauses without limiting grammatical competence. This period marked a shift from structuralist descriptions to generative models that prioritized explanatory adequacy, incorporating center embedding to demonstrate the inadequacy of non-recursive grammars. Within , center embedding exemplifies recursive rules that enable the infinite generation of novel sentences. In , developed in the , recursive embedding arises through the hierarchical layering of specifiers, heads, and complements, allowing phrases to nest within similar categories while maintaining endocentric structure across languages. This framework posits a universal template for phrase structure, where center embedding tests the theory's capacity to handle multiple levels of without violating projection principles. The , evolving from the , refines this by attributing to the basic operation Merge, which iteratively combines elements to form symmetric sets, thereby deriving center-embedded structures as an emergent property of economy-driven syntax. Theoretical distinctions between self-embedding and center embedding highlight implications for phrase structure grammars. Self-embedding refers to the general process of inserting a constituent of category X within another X, which confers context-free generative power by enabling non-adjacent dependencies. Center embedding, a subtype, specifically positions the embedded constituent medially, disrupting linear adjacency and increasing structural complexity within the same recursive framework. In phrase structure grammars, these distinctions underscore how center embedding amplifies demands without altering the underlying recursive rules, influencing models from context-free grammars to more constrained variants. Debates on constraints center on whether center embedding reflects universal recursion or language-specific limits. Susumu Kuno's 1974 hypothesis posits that perceptual and functional pressures, such as memory economy, lead languages to favor right-branching over center embedding for relative clauses and conjunctions, suggesting typological variations in embedding depth. In contrast, Chomsky maintains that , including center embedding, is a universal feature of the language faculty, with observed limits attributable to performance rather than competence restrictions. This tension has persisted into modern frameworks like optimality theory, where center embedding is modeled as an interaction of ranked constraints balancing faithfulness to recursive structure against markedness penalties for complexity.

Applications in Natural Language Processing

Center embedding poses significant challenges to traditional shift-reduce parsers in , particularly in dependency , where deeply nested structures can lead to increased stack depth and potential overflow due to the need to maintain unresolved dependencies on the stack until the embedding resolves. Arc-eager algorithms mitigate these issues in transition-based dependency by enabling early attachment of rightward dependents, allowing the stack to be reduced more promptly during of projective nested constructions like center embeddings, thus improving efficiency for such syntactic patterns. In the realm of large language models (LLMs), center embedding serves as a benchmark for assessing syntactic competence, with a 2023 study from the Society for Computation in Linguistics evaluating 's ability to handle unbounded center embedding constructions permitted by competence grammars. The analysis revealed that achieves high accuracy on sentences with up to four levels of embedding—contrasting sharply with human limitations beyond one level—demonstrating near-pure competence at shallow to moderate depths, though performance was not tested at arbitrarily deep levels. To enhance recursion handling in transformer-based models, researchers have employed synthetic nested data, including center-embedded structures, during intermediate pre-training to strengthen structural inductive biases and improve generalization to deeper syntactic recursion. For instance, pre-training on synthetically generated syntactic transformations encompassing center embedding types like relative clause nesting has been shown to boost few-shot performance on downstream tasks requiring recursive processing, such as , by encouraging models to acquire reusable syntactic dynamics. In , nested structures highlight challenges in capturing long-range dependencies across languages, often leading to errors if models fail to resolve them accurately. systems address this by incorporating explicit syntactic features, such as dependency trees, to better handle embedding-induced long-range dependencies and improve translation quality across language pairs.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.