Hubbry Logo
Linear grammarLinear grammarMain
Open search
Linear grammar
Community hub
Linear grammar
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Linear grammar
Linear grammar
from Wikipedia

In computer science, a linear grammar is a context-free grammar that has at most one nonterminal in the right-hand side of each of its productions.

A linear language is a language generated by some linear grammar.

Example

[edit]

An example of a linear grammar is G with N = {S}, Σ = {a, b}, P with start symbol S and rules

S → aSb
S → ε

It generates the language .

Relationship with regular grammars

[edit]

Two special types of linear grammars are the following:

  • the left-linear or left-regular grammars, in which all rules are of the form A → αw where α is either empty or a single nonterminal and w is a string of terminals;
  • the right-linear or right-regular grammars, in which all rules are of the form A → wα where w is a string of terminals and α is either empty or a single nonterminal.

Each of these can describe exactly the regular languages. A regular grammar is a grammar that is left-linear or right-linear.

Observe that by inserting new nonterminals, any linear grammar can be replaced by an equivalent one where some of the rules are left-linear and some are right-linear. For instance, the rules of G above can be replaced with

S → aA
A → Sb
S → ε

However, the requirement that all rules be left-linear (or all rules be right-linear) leads to a strict decrease in the expressive power of linear grammars.

Expressive power

[edit]

All regular languages are linear; conversely, an example of a linear, non-regular language is { anbn }. as explained above. All linear languages are context-free; conversely, an example of a context-free, non-linear language is the Dyck language of well-balanced bracket pairs. Hence, the regular languages are a proper subset of the linear languages, which in turn are a proper subset of the context-free languages.

While regular languages are deterministic, there exist linear languages that are nondeterministic. For example, the language of even-length palindromes on the alphabet of 0 and 1 has the linear grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without reading all its letters first which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string.[1] This language is nondeterministic. Since nondeterministic context-free languages cannot be accepted in linear time [clarification needed], linear languages cannot be accepted in linear time in the general case. Furthermore, it is undecidable whether a given context-free language is a linear context-free language.[2]

A language is linear iff it can be generated by a one-turn pushdown automaton – a pushdown automaton that, once it starts popping, never pushes again.

Closure properties

[edit]

Positive cases

[edit]

Linear languages are closed under union. Construction is the same as the construction for the union of context-free languages. Let be two linear languages, then is constructed by a linear grammar with , and playing the role of the linear grammars for .

If L is a linear language and M is a regular language, then the intersection is again a linear language; in other words, the linear languages are closed under intersection with regular sets.

Linear languages are closed under homomorphism and inverse homomorphism.[3]

As a corollary, linear languages form a full trio. Full trios in general are language families that enjoy a couple of other desirable mathematical properties.

Negative cases

[edit]

Linear languages are not closed under intersection. For example, let , then their intersection is not only not linear, but also not context-free. See pumping lemma for context-free languages.

As a corollary, linear languages are not closed under complement (as intersection can be constructed by de Morgan's laws out of union and complement).

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A linear grammar is a type of in formal language theory in which each production rule contains at most one nonterminal symbol on the right-hand side. This restriction distinguishes linear grammars from general , which allow multiple nonterminals per production, and positions them as a subclass within the . Linear grammars generate linear languages, which form a proper subset of the context-free languages; while all linear languages are context-free, not all context-free languages are linear. Linear languages encompass all regular languages, as right-linear and left-linear grammars—subclasses of linear grammars—precisely characterize the regular languages via equivalence to finite automata. Notable examples of linear languages include the set of all palindromes over a binary alphabet, generated by productions such as SaSabSbabϵS \to aSa \mid bSb \mid a \mid b \mid \epsilon, and the language {anbnn0}\{a^n b^n \mid n \geq 0\}, which can be derived using SaSbϵS \to aSb \mid \epsilon. However, languages requiring multiple independent recursive structures, such as the Dyck language of balanced parentheses strings, cannot be generated by linear grammars and thus serve as canonical examples of context-free but non-linear languages. Regarding closure properties, linear languages are closed under union, , and , allowing constructions similar to those for context-free languages but adapted to maintain the single nonterminal constraint. They are also closed under with regular languages, reflecting their position between regular and context-free classes. Unlike context-free languages, linear languages lack closure under or complementation in general, highlighting their intermediate expressive power. Parsing linear languages can be performed efficiently using variants of pushdown automata or dynamic programming techniques tailored to their bounded depth.

Definition and Formalism

Formal Definition

A linear grammar is a G=(V,Σ,P,S)G = (V, \Sigma, P, S), where VV is the of nonterminal symbols (variables), Σ\Sigma is the of terminal symbols (with VΣ=V \cap \Sigma = \emptyset), PP is a finite set of production rules, and SVS \in V is the start symbol (). The productions in PP are restricted such that each rule is of the form AuBvA \to u B v or AwA \to w, where A,BVA, B \in V are nonterminals, u,v,wΣu, v, w \in \Sigma^* are (possibly empty) strings of terminals, and at most one nonterminal appears on the right-hand side. This structure ensures that derivations proceed by expanding a single nonterminal at each step, flanked by terminal strings. form a proper subclass of , which allow multiple nonterminals on the right-hand side without such restrictions. Special cases of linear grammars include left-linear and right-linear grammars, which further restrict the position of the nonterminal in productions of the form with a nonterminal. A left-linear grammar has productions ABuA \to B u or AuA \to u, where the nonterminal (if present) appears at the left end of the right-hand side. A right-linear grammar has productions AuBA \to u B or AuA \to u, where the nonterminal (if present) appears at the right end. Epsilon-productions (AϵA \to \epsilon) are permitted in linear grammars following the same conventions as in context-free grammars, including for the start symbol if the empty string is in the language.

Production Rules

In a linear grammar G=(V,Σ,P,S)G = (V, \Sigma, P, S), the production rules in PP are restricted such that each rule is of the form AαA \to \alpha, where AVA \in V is a non-terminal, and α(ΣV)\alpha \in (\Sigma \cup V)^* contains at most one non-terminal symbol from VV; rules with no non-terminals on the right-hand side, such as AwA \to w where wΣw \in \Sigma^*, are also permitted. The key constraint is that non-terminals appear exactly once (or not at all) in the right-hand side of any production, ensuring that derivations maintain a linear structure by avoiding branching through multiple non-terminals in a single step. In linear cases involving a non-terminal, it may be positioned at the left end (left-linear), right end (right-linear), or in the interior (general linear). Derivations in linear grammars follow a step-by-step replacement process, denoted as αβ\alpha \Rightarrow \beta when a production AγA \to \gamma replaces an occurrence of AA in α\alpha to yield β\beta; a sequence of such steps is written αδ\alpha \Rightarrow^* \delta. This process is termed a linear derivation because each intermediate sentential form contains at most one non-terminal, starting from the single non-terminal SS and preserving this property through substitutions that introduce at most one new non-terminal per step. Productions violating linearity include those with multiple non-terminals on the right-hand side, such as ABCuA \to B C u (two non-terminals followed by terminals) or AuBCvA \to u B C v (terminals sandwiching two non-terminals), as these introduce branching beyond the single non-terminal limit. Similarly, ABuCA \to B u C mixes terminals with more than one non-terminal, breaking the constraint.

Examples and Illustrations

Basic Example

A simple example of a right-linear grammar generates the language L={ann0}L = \{ a^n \mid n \geq 0 \}, which consists of all finite strings composed of zero or more instances of the terminal symbol aa, including the empty string ϵ\epsilon. This grammar GG has a single nonterminal SS (the start symbol), terminal alphabet {a}\{a\}, and the following production rules: SaSϵS \to aS \mid \epsilon These rules ensure that strings are built by optionally prefixing an aa to a previously derived string, terminating with the empty production when no more aa's are added. To illustrate, consider the derivation of the string aaaaaa (i.e., a3a^3) from the start symbol SS. The sequence proceeds leftmost, replacing the single nonterminal at each step: SaSaaSaaaSaaaϵ=aaaS \Rightarrow aS \Rightarrow aaS \Rightarrow aaaS \Rightarrow aaa\epsilon = aaa This derivation highlights the linear constraint: each production introduces at most one nonterminal on the right-hand side, resulting in a sequence where only one nonterminal appears per sentential form until termination. The corresponding parse tree is a right-branching chain: SS expands to aa followed by another SS, repeated three times, ending in ϵ\epsilon. This grammar exemplifies a linear grammar, as every production adheres to the form with at most one nonterminal in the right-hand side. It is also regular, since it qualifies as right-linear, where all nonterminals (if present) appear at the right end of the right-hand side.

Non-Regular Linear Language Example

A classic example of a non-regular language generated by a linear grammar is L={anbnn1}L = \{ a^n b^n \mid n \geq 1 \}, which consists of strings with an equal number of aa's followed by the same number of bb's. This language cannot be generated by a regular grammar, as demonstrated by the pumping lemma for regular languages, which fails for strings like apbpa^p b^p where pp is the pumping length. A strictly linear grammar for this language is G=({S},{a,b},S,P)G = (\{S\}, \{a, b\}, S, P), where the production rules are SaSbabS \to a S b \mid ab. Each production contains exactly one non-terminal on the right-hand side, ensuring the by restricting to a single embedded variable. The base production SabS \to ab generates the string for n=1n=1, while the recursive rule SaSbS \to a S b wraps matching terminals around an inner derivation. To illustrate, consider the derivation for n=2n=2, yielding aabbaabb: SaSba(ab)b=aabb\begin{align*} S &\Rightarrow a S b \\ &\Rightarrow a (ab) b \\ &= aabb \end{align*} This process builds the string outward from the center, with the non-terminal SS always positioned between the accumulating aa's and bb's, enforcing the count equality through linear expansion without requiring multiple non-terminals or context sensitivity. The single non-terminal's central placement in derivations prevents mismatches in aa and bb counts, as each recursion adds one aa on the left and one bb on the right, mirroring the stack operations of a that accepts this deterministic . Thus, while equivalent to pushdown automaton acceptance, the linear grammar highlights generation via bounded non-terminal usage.

Relations to Other Grammars

Relation to Regular Grammars

Linear grammars encompass a class of context-free grammars where each production rule contains at most one nonterminal symbol on the right-hand side. Within this class, right-linear grammars—those where the nonterminal, if present, appears at the right end of the right-hand side (e.g., AαBA \to \alpha B or AαA \to \alpha, with α\alpha a string of terminals)—generate exactly the regular languages. Similarly, left-linear grammars, where the nonterminal appears at the left end (e.g., ABαA \to B \alpha or AαA \to \alpha), also generate precisely the regular languages. This equivalence positions right-linear and left-linear grammars at Type 3 in the , corresponding to finite-state automata. The conversion from a right-linear grammar to a regular expression or nondeterministic finite automaton (NFA) follows a straightforward inductive process: treat nonterminals as states, map each production AaBA \to a B to a transition from state AA to BB on symbol aa, and designate the start symbol as the initial state with accepting states corresponding to productions yielding terminals. Conversely, constructing a right-linear grammar from an NFA involves creating a nonterminal for each state, with productions mirroring the transitions (e.g., from state qiq_i to qjq_j on aa yields QiaQjQ_i \to a Q_j) and epsilon transitions handled by additional rules. This bidirectional correspondence ensures that every regular language admits a right-linear (or left-linear) grammar, and vice versa, without loss of expressive power. However, not all linear grammars are regular, as the placement of the nonterminal in the middle of the right-hand side (e.g., AαBβA \to \alpha B \beta with both α\alpha and β\beta nonempty) can generate non-regular languages, such as the canonical example {anbnn0}\{ a^n b^n \mid n \geq 0 \}. This language, produced by the linear grammar SaSbϵS \to a S b \mid \epsilon, requires unbounded matching beyond finite-state capabilities, as proven by the . Linear grammars were introduced in the 1950s by as part of his foundational work on formal language hierarchies, serving as an intermediate structure between finite-state (regular) models and more powerful phrase-structure systems to analyze linguistic complexity. This positioning highlighted their role in bridging simple with hierarchical syntax in early formal language theory.

Relation to Context-Free Grammars

Linear grammars form a proper of , as every linear grammar is context-free by definition, but the class of linear languages is strictly smaller than the class of context-free languages. For instance, the language L={w{a,b}#a(w)=#b(w)}L = \{ w \in \{a,b\}^* \mid \#_a(w) = \#_b(w) \} (strings with equal numbers of aa's and bb's) is context-free but not linear. It can be generated by the context-free grammar with productions SaSbSbSaSϵS \to a S b S \mid b S a S \mid \epsilon, which includes rules with multiple nonterminals on the right-hand side to handle arbitrary interleaving. The key structural difference arises from the form of production rules. Context-free grammars permit productions of the form AαA \to \alpha, where α\alpha is any string over the alphabet of terminals and nonterminals, potentially including multiple nonterminals (e.g., ABCA \to BC). Linear grammars, however, restrict each production to AuBvA \to u B v or AwA \to w, where uu and vv are (possibly empty) strings of terminals, BB is a single nonterminal, and ww is a string of terminals, ensuring at most one nonterminal appears on the right-hand side. Determining whether a given is linear involves a simple algorithmic check: examine each production rule and verify that its right-hand side contains no more than one nonterminal. This inspection runs in time linear in the total size of the grammar's production set. This restriction on linear grammars implies advantageous characteristics. Linear languages are deterministic context-free languages, allowing them to be parsed deterministically in linear time via pushdown automata or dedicated algorithms, whereas general context-free languages may necessitate nondeterministic with up to cubic in the input length.

Expressive Power

Languages Generated

Linear languages are the class of formal languages generated by linear grammars, denoted L(G)L(G) where GG is a linear grammar consisting of context-free productions with at most one nonterminal symbol on the right-hand side of each rule. In the , linear languages form a proper subclass that strictly contains all regular languages (Type-3) and is strictly contained within the context-free languages (Type-2), which themselves are a proper subclass of the recursively enumerable languages (Type-0). For instance, the language {anbnn0}\{a^n b^n \mid n \geq 0\} is linear, while the language {anbncnn0}\{a^n b^n c^n \mid n \geq 0\} is context-free but not linear. A characterizing property of linear languages is their acceptance by one-turn pushdown automata, which are nondeterministic pushdown automata that perform at most one change of direction on the stack (pushing until a point and then only popping without further pushing). This equivalence highlights the bounded "linear" structure in derivations, distinguishing them from general context-free languages that may require multiple stack turns. While all regular languages are linear and thus accepted by deterministic pushdown automata, not every linear language is deterministic context-free. For example, the even-length language {wwRw{a,b}}\{ww^R \mid w \in \{a,b\}^*\} is linear but not accepted by any .

Limitations Compared to Context-Free

Linear grammars, while capable of generating a wide range of context-free languages, suffer from significant expressive limitations when compared to the full class of context-free grammars. Specifically, they cannot generate languages that require multiple independent recursive structures, such as the language L={anbncmdmn,m0}L = \{ a^n b^n c^m d^m \mid n, m \geq 0 \}, which consists of strings where the number of aa's matches the number of bb's independently of the matching between cc's and dd's. This language is context-free, as it can be generated by a context-free grammar with productions that handle the two pairs separately (e.g., SACS \to A C, AaAbϵA \to a A b \mid \epsilon, CcCdϵC \to c C d \mid \epsilon), but no linear grammar exists for it. The fundamental reason for this limitation lies in the structure of linear derivations. In a linear grammar, every production rule is of the form AuBvA \to u B v (with u,vu, v terminal strings and BB a nonterminal) or AwA \to w (a terminal string), ensuring that any sentential form during a derivation contains at most one nonterminal at any time. This restricts the grammar to a single "path" of , akin to a single stack in a , which suffices for dependencies like anbna^n b^n but fails for two-way or independent dependencies that demand branching or multiple concurrent counts. Consequently, linear grammars lack the ability to "remember" and coordinate multiple unbounded pieces of information simultaneously, unlike general context-free grammars that allow productions with multiple nonterminals (e.g., ABCA \to B C) to enable nested or parallel recursions. These limitations are formally established through an adapted pumping lemma for linear languages, which demonstrates that linear languages form a proper subset of the context-free languages. The lemma states that for any linear language LL, there exists a constant pp such that every string wLw \in L with wp|w| \geq p can be decomposed as w=uvxyzw = u v x y z where vxyp|v x y| \leq p, vy1|v y| \geq 1, and uvkxykzLu v^k x y^k z \in L for all k0k \geq 0. Applying this to L={anbncmdmn,m0}L = \{ a^n b^n c^m d^m \mid n, m \geq 0 \} with a long string like anbncndna^n b^n c^n d^n leads to a contradiction, as any valid decomposition and pumping disrupts the equality in at least one pair without affecting the other, producing strings outside LL. This contrasts with the standard context-free pumping lemma, which allows more flexible decompositions (e.g., uvwxyu v w x y with pumping on vv and xx) sufficient for the language. The lower generative density of linear languages compared to context-free ones is thus quantifiable via such lemmas, highlighting their intermediate position in the Chomsky hierarchy between regular and full context-free languages. Due to these constraints, linear grammars are well-suited for modeling "linear" or single-recursion phenomena, such as simple nested structures in certain natural language constructs or program analyses, but they fall short for fully nested or multiply dependent patterns that require the broader power of context-free grammars.

Closure Properties

Positive Closure Properties

Linear languages demonstrate robustness through several positive closure properties, where applying specific operations to linear languages yields another linear language. These properties are established via constructive proofs that preserve the linear structure of grammars, in which productions contain at most one nonterminal on the right-hand side. The class of linear languages is closed under union. Given two linear grammars G1G_1 with start symbol S1S_1 and G2G_2 with start symbol S2S_2, construct a new grammar GG with a fresh start symbol SS and productions SS1S \to S_1 and SS2S \to S_2, along with all productions from G1G_1 and G2G_2 (renaming nonterminals in one grammar if necessary to avoid conflicts). This GG is linear, as the added productions are of the form SSiS \to S_i (with empty terminal strings around the nonterminal), and it generates L(G1)L(G2)L(G_1) \cup L(G_2). The class of linear languages is closed under reversal. The reversal of a linear language LL, denoted LR={wRwL}L^R = \{w^R \mid w \in L\} where wRw^R is the reverse of string ww, is also linear. To construct a linear grammar for LRL^R from a linear grammar G=(V,Σ,P,S)G = (V, \Sigma, P, S) for LL, form a new grammar GRG^R with the same nonterminals and start symbol, but replace each production AuBvA \to uBv (where u,vΣu, v \in \Sigma^* and BVB \in V) with AvRBuRA \to v^RB u^R, and each terminal production AwA \to w with AwRA \to w^R. This preserves the linear form, as each right-hand side still has at most one nonterminal flanked by terminal strings, and L(GR)=L(G)RL(G^R) = L(G)^R. Linear languages are also closed under homomorphism and inverse homomorphism. For a homomorphism h:ΣΔh: \Sigma^* \to \Delta^* and linear language LΣL \subseteq \Sigma^*, h(L)h(L) is linear; the grammar is obtained by replacing each terminal string in the productions of a grammar for LL with its image under hh, yielding a linear grammar over Δ\Delta. Similarly, for inverse homomorphism, if LΔL \subseteq \Delta^* is linear, then h1(L)={wΣh(w)L}h^{-1}(L) = \{w \in \Sigma^* \mid h(w) \in L\} is linear, via a construction that simulates the homomorphism in the grammar derivations while maintaining linearity. These properties imply that linear languages form a full trio. Additionally, linear languages are closed under intersection with regular languages, which can be shown using a product construction between a linear grammar and a , resulting in a linear grammar for the .

Negative Closure Properties

Linear languages are not closed under . To illustrate this, consider the languages L1={ajbjckj,k0}L_1 = \{a^j b^j c^k \mid j, k \geq 0\} and L2={akbjcjj,k0}L_2 = \{a^k b^j c^j \mid j, k \geq 0\}, both of which are generated by linear grammars. Their is L1L2={anbncnn0}L_1 \cap L_2 = \{a^n b^n c^n \mid n \geq 0\}, which is not context-free and hence not linear, as proven by the applied to strings of the form apbpcpa^p b^p c^p where pp is the pumping length. This counterexample demonstrates that the intersection of two linear languages may require context-sensitive power beyond linear grammars. Although specific cases like the intersection of {anbnn0}\{a^n b^n \mid n \geq 0\} and {amb2mm0}\{a^m b^{2m} \mid m \geq 0\} yield a linear language {anbnn0}\{a^n b^n \mid n \geq 0\}, the general failure under highlights the boundaries of the class. In contrast to regular languages, which are closed under , linear languages exhibit this limitation due to their restricted derivation structure allowing only one nonterminal at a time. Linear languages are also not closed under complement. Since the family is closed under union but not under intersection, closure under complement would imply closure under intersection via De Morgan's laws (), leading to a contradiction. An explicit construction involves the marked-copy language M={wcww{a,b}}M = \{w c w \mid w \in \{a, b\}^*\} over alphabet {a,b,c}\{a, b, c\}, which is not context-free (and thus not linear) by the pumping lemma, as pumping a string apbpcapbpa^p b^p c a^p b^p violates the equal-prefix-suffix property. However, its complement M\overline{M} is linear, generated by a linear grammar that enforces mismatch between the prefix and suffix around cc; if linear languages were closed under complement, then the complement of M\overline{M} (i.e., MM) would also be linear, which it is not. Linear languages are not closed under or . This, along with the lack of closure under and complement, highlights their intermediate expressive power between regular and context-free languages.
Add your contribution
Related Hubs
User Avatar
No comments yet.