Recent from talks
Contribute something
Nothing was collected or created yet.
SYNTAX
View on Wikipedia| SYNTAX | |
|---|---|
| Developer | INRIA |
| Type | Generator |
| License | CeCILL |
| Website | sourcesup |
In computer science, SYNTAX is a system used to generate lexical and syntactic analyzers (parsers) (both deterministic and non-deterministic) for all kinds of context-free grammars (CFGs) as well as some classes of contextual grammars.[citation needed] It has been developed at INRIA in France for several decades, mostly by Pierre Boullier, but has become free software since 2007 only. SYNTAX is distributed under the CeCILL license.[citation needed]
Context-free parsing
[edit]SYNTAX handles most classes of deterministic (unambiguous) grammars (LR, LALR, RLR as well as general context-free grammars. The deterministic version has been used in operational contexts (e.g., Ada[1]), and is currently used both in the domain of compilation.[2] The non-deterministic features include an Earley parser generator used for natural language processing.[3] Parsers generated by SYNTAX include powerful error recovery mechanisms, and allow the execution of semantic actions and attribute evaluation on the abstract tree or on the shared parse forest.
Contextual parsing
[edit]The current version of SYNTAX (version 6.0 beta) includes also parser generators for other formalisms, used for natural language processing as well as bio-informatics. These formalisms are context-sensitive formalisms (TAG, RCG or formalisms that rely on context-free grammars and are extended thanks to attribute evaluation, in particular for natural language processing (LFG).
Error recovery
[edit]A nice feature of SYNTAX (compared to Lex/Yacc) is its built-in algorithm[4] for automatically recovering from lexical and syntactic errors, by deleting extra characters or tokens, inserting missing characters or tokens, permuting characters or tokens, etc. This algorithm has a default behaviour that can be modified by providing a custom set of recovery rules adapted to the language for which the lexer and parser are built.
References
[edit]- ^ The first tool-translator for the ADA language has been developed with SYNTAX by Pierre Boullier and others, as recalled in this page on the history of ADA. See also Pierre Boullier and Knut Ripken. Building an Ada compiler following meta-compilation methods. In Séminaires Langages et Traducteurs 1978-1981, pages 99-140. INRIA, Rocquencourt, France, 1981.
- ^ E.g., by the VASY and CONVECS teams at INRIA, in particular for the development of CADP and Traian.
- ^ E.g., in the SxLFG parser, whose first version is described in this paper.
- ^ Pierre Boullier and Martin Jourdan. A New Error Repair and Recovery Scheme for Lexical and Syntactic Analysis. Science of Computer Programming 9(3): 271-286 (1987).
External links
[edit]SYNTAX
View on GrokipediaFundamentals
Etymology
The term "syntax" originates from the Ancient Greek σύνταξις (syntaxis), denoting "arrangement" or "a putting together," derived from the prefix σύν- (syn-, "together") and τάξις (taxis, "arrangement" or "order").[11] In classical Greek linguistic and philosophical contexts, it initially encompassed the systematic organization of elements, including rhetorical and logical structures.[12] The term entered Latin as syntaxis through scholarly translations and adaptations of Greek grammatical works, with its first systematic application appearing in Priscian's Institutiones Grammaticae (early 6th century CE).[13] Priscian, a grammarian active in Constantinople, employed syntaxis in Books 17 and 18 to describe the construction and dependencies of sentences, marking the inaugural comprehensive treatment of Latin syntax and establishing it as a core component of grammatical study. This adoption bridged Greek theoretical foundations with Latin pedagogical needs, influencing medieval grammatical traditions.[14] During the Renaissance, the meaning of syntaxis underwent a notable evolution, transitioning from a rhetorical emphasis on stylistic arrangement—rooted in classical oratory—to a stricter grammatical focus on the structural rules governing sentence formation across vernacular languages. Humanist scholars, drawing on Priscian's framework while adapting it to emerging national grammars, integrated syntax into broader linguistic analyses, as exemplified in Julius Caesar Scaliger's De causis linguae Latinae (1540), which emphasized logical and morphological interrelations in sentence building.[15] This shift facilitated the development of syntax as an autonomous field, distinct from rhetoric, in early modern European linguistics.[12]Definition and Scope
Syntax is the branch of linguistics that studies the rules, principles, and processes governing the formation of sentences in a language, particularly how words combine to create phrases, clauses, and larger syntactic units.[1] This field examines the structural arrangements that determine whether sequences of words are grammatically well-formed, independent of their sound patterns or meanings.[3] The scope of syntax encompasses key phenomena such as phrase structure, which organizes words into hierarchical units like noun phrases and verb phrases; agreement, where elements like subjects and verbs match in features such as number and person; case marking, which indicates grammatical roles through affixes or word order; and recursion, allowing structures to embed within themselves to produce complex sentences.[9] However, syntax explicitly excludes phonology, the study of sound systems and pronunciation, and semantics, the analysis of meaning and interpretation.[10] These boundaries ensure that syntactic inquiry focuses on form and arrangement rather than auditory or interpretive aspects of language.[16] Syntax is distinct from morphology, which concerns the internal structure of words and how they are built from smaller units called morphemes.[17] For instance, in English, morphology handles verb conjugation, such as adding the suffix "-s" to form "walks" from "walk" to indicate third-person singular present tense, whereas syntax governs the arrangement of words into sentences, like positioning the subject before the verb in declarative statements ("The dog walks").[18] This division highlights morphology's focus on word-level modifications versus syntax's emphasis on inter-word relations and sentence-level organization.[19] Within linguistic theory, syntax plays a central role in distinguishing universal grammar—innate principles common to all human languages—from language-specific rules that vary across tongues.[20] Noam Chomsky's generative framework posits that syntactic competence is biologically endowed, enabling children to acquire complex structures rapidly despite limited input, as outlined in his seminal works on universal grammar.[21] This innate perspective underscores syntax's foundational position in the human language faculty, balancing universal constraints with parametric variations in individual languages.[22]Core Concepts
Word Order
Word order in syntax refers to the linear arrangement of major syntactic elements, such as the subject (S), verb (V), and object (O), within a clause. This sequencing varies systematically across languages and plays a crucial role in conveying grammatical meaning, often interacting with morphological markers like case or agreement to disambiguate roles. Typologically, languages are classified based on the dominant order of these elements in declarative sentences, with six primary patterns possible: SVO, SOV, VSO, VOS, OSV, and OVS, though the last two are rare. The most common word order types are SVO and SOV, which together account for approximately 75% of the world's languages according to the World Atlas of Language Structures (WALS) database. English exemplifies SVO order, as in "The cat (S) chased (V) the mouse (O)," where the subject precedes the verb, and the object follows. In contrast, Japanese represents SOV order, as seen in "Neko-ga (S) nezu-o (O) oikaketa (V)," with the object appearing before the verb. VSO order is prevalent in many Celtic and Austronesian languages; for instance, Irish uses VSO in sentences like "Chonaic (V) mé (S) an fear (O)" meaning "I saw the man." These basic orders provide a foundation for understanding syntactic variation, though actual usage can be influenced by additional factors. Several factors influence deviations from rigid word order, including the animacy hierarchy, which prioritizes more animate entities (e.g., humans over inanimates) in prominent positions, and discourse prominence, where elements like topics or foci may front or postpone based on information structure. For example, in Turkish (SOV-dominant), animate objects can precede the verb more readily than inanimates to highlight them. Typological tendencies also correlate word order with other features, such as head-initial (SVO) languages favoring prepositions over postpositions, while head-final (SOV) languages show the reverse pattern. These influences ensure that word order serves both grammatical and pragmatic functions across language families. Some languages exhibit free or flexible word order, where the sequence of elements can vary without altering basic meaning, often due to rich case marking that encodes grammatical roles morphologically. Latin is a classic example: the sentence "Puella (S) puerum (O) videt (V)" can be reordered as "Puerum puella videt" or other permutations, with nominative and accusative cases distinguishing subject from object. This flexibility is common in languages with overt case systems, such as Russian or Warlpiri, allowing stylistic or discourse-driven rearrangements while maintaining syntactic coherence through inflection. Historical shifts in word order illustrate how contact, simplification, or internal evolution can reshape syntax. Old English, originally SOV in main clauses with subordinate-like embedding, transitioned to SVO around the 12th century, influenced by Norman French contact and the loss of robust case endings, which necessitated fixed positioning for clarity. Similar shifts occur in creoles or language contact scenarios, underscoring word order's adaptability over time.Grammatical Relations
Grammatical relations in syntax describe the abstract functional dependencies between constituents in a clause, primarily involving the predicate and its arguments, such as the subject, direct object, indirect object, and adjuncts. The subject relation typically identifies the primary argument, often encoding the agent (the initiator of an action) or theme (the entity undergoing change), as seen in English sentences like "The dog chased the cat," where "the dog" is the subject-agent and "the cat" is the direct object-patient.[23] The direct object relation marks the entity most directly affected by the predicate, while the indirect object specifies a secondary beneficiary or recipient, as in "She gave him a book." Predicate relations link the verb to these arguments, and adjuncts provide optional modifiers like time or location without core participation in the event. Identification of these relations relies on multiple criteria, including morphological agreement, government, and behavioral tests. Agreement involves feature matching between the subject and predicate, such as number and person; in Spanish, for instance, a singular subject requires a singular verb form, as in "El perro corre" (the dog runs), where the verb "corre" agrees in third-person singular with "perro," but mismatches like "*El perro corren" are ungrammatical.[24] Government refers to the structural dominance of a head (e.g., a verb) over its dependents, enabling case assignment; verbs govern and assign accusative case to direct objects in languages like German, where "Ich sehe den Hund" (I see the dog) marks "Hund" with accusative "-en" under the verb's government.[25] Behavioral tests further diagnose relations through syntactic operations: in passivization, the direct object of an active clause like "The cat chased the dog" raises to subject position in "The dog was chased by the cat," while the original subject demotes to an oblique; raising constructions similarly promote subjects, as in "The dog seems to chase the cat," where only the subject "the dog" can raise from the embedded clause.[26] Cross-linguistically, grammatical relations exhibit variations in alignment systems, contrasting accusative (where the subject of intransitives aligns with transitive subjects, S=A ≠ O) and ergative (where the subject of intransitives aligns with transitive objects, S=O ≠ A) patterns. In accusative languages like English or Spanish, the subject of "The dog runs" patterns with that of "The dog chases the cat" in controlling verb agreement and word order. Ergative alignment appears in languages like Basque, where the intransitive subject in "Gizonak korrika egiten du" (the man runs) takes absolutive case (unmarked), aligning with the transitive object in "Gizonak mutila ikusi du" (the man saw the boy), while the transitive subject takes ergative "-ak"; this inverts the typical subject-object hierarchy for morphological marking and some syntactic behaviors.[27] These relations play a crucial role in sentence interpretation by projecting semantic content into syntactic structure, particularly through theta roles, which assign thematic interpretations like agent or theme to arguments in specific positions. Under the Uniformity of Theta Assignment Hypothesis, theta roles such as agent (external argument in specifier position) and theme (internal argument as complement) are systematically mapped to syntactic projections, ensuring that event participants like the agent in "John broke the window" occupy the subject position to license the thematic structure.[28] This projection facilitates semantic composition while interacting with surface variations like word order, though relations remain abstract and positional-independent.[23]Constituency and Phrase Structure
In syntax, constituency refers to the hierarchical grouping of words into larger units known as constituents, such as noun phrases (NPs), verb phrases (VPs), and clauses, which form the building blocks of sentence structure.[29] These groupings are not merely linear sequences but reflect functional and structural relationships that determine how sentences are parsed and interpreted.[30] Linguists identify constituents through specific tests that reveal whether a string of words behaves as a cohesive unit. One key method is the substitution test, where a potential constituent can be replaced by a single word or pro-form, such as a pronoun, without altering the sentence's grammaticality. For example, in "The big dog barked loudly," the string "the big dog" can be substituted with "it" to yield "It barked loudly," indicating that "the big dog" forms an NP constituent.[29] Similarly, "barked loudly" can be replaced with "did so" in "The big dog did so," confirming it as a VP. Another test is movement, which checks if a string can be relocated within the sentence while preserving grammaticality; for instance, "The big dog" can be fronted to "The big dog, I saw yesterday," but individual words like "big" cannot move alone in the same way.[31] The coordination test involves joining two identical strings with a conjunction like "and"; in "I saw the dog and the cat," both "the dog" and "the cat" can be coordinated, showing they are parallel NP constituents, whereas "dog and the" cannot.[29] These tests collectively demonstrate that constituents exhibit unified behavior in syntactic operations.[30] Phrase structure rules provide a formal way to represent these hierarchical groupings, specifying how categories expand into subconstituents. Introduced in early generative linguistics, a basic set of rules for English might include S → NP VP (a sentence consists of a noun phrase followed by a verb phrase), NP → Det N (a noun phrase consists of a determiner and a noun), and VP → V (a verb phrase consists of a verb).[22] These rules generate tree structures that visualize the hierarchy; for the sentence "The cat sleeps," the structure is as follows: S
/ \
NP VP
/| |
Det N V
| | |
The cat sleeps
S
/ \
NP VP
/| |
Det N V
| | |
The cat sleeps
