Recent from talks
Nothing was collected or created yet.
Expression problem
View on WikipediaThe expression problem is a problem in programming languages that concerns the extensibility and modularity of statically typed data abstractions. The goal is to define a data abstraction that is extensible both in its representations and its behaviors, where one can add new representations and new behaviors to the data abstraction, without recompiling existing code, and while retaining static type safety (e.g., no casts). The statement of the problem exposes deficiencies in programming paradigms and programming languages. Philip Wadler, one of the co-authors of Haskell, has originated the term.
History
[edit]Philip Wadler formulated the challenge and named it "The Expression Problem"[1] in response to a discussion with Rice University's Programming Languages Team (PLT). He also cited three sources that defined the context for his challenge:
The problem was first observed by John Reynolds in 1975.[2] Reynolds discussed two forms of Data Abstraction: User-defined Types, which are now known as Abstract Data Types (ADTs) (not to be confused with Algebraic Data Types), and Procedural Data Structures, which are now understood as a primitive form of Objects with only one method. He argued that they are complementary, in that User-defined Types could be extended with new behaviors, and Procedural Data Structures could be extended with new representations. He also discussed related work going back to 1967. Fifteen years later in 1990, William Cook[3] applied Reynold's idea in the context of Objects and Abstract Data Types, which had both grown extensively. Cook identified the matrix of representations and behaviors that are implicit in a Data Abstraction, and discussed how ADTs are based on the behavioral axis, while Objects are based on the representation axis. He provides extensive discussion of work on ADTs and Objects that are relevant to the problem. He also reviewed implementations in both styles, discussed extensibility in both directions, and also identified the importance of static typing. Most importantly, he discussed situations in which there was more flexibility than Reynolds considered, including internalization and optimization of methods.
At ECOOP '98, Shriram Krishnamurthi et al.[4] presented a design pattern solution to the problem of simultaneously extending an expression-oriented programming language and its tool-set. They dubbed it the "expressivity problem" because they thought programming language designers could use the problem to demonstrate the expressive power of their creations. For PLT, the problem had shown up in the construction of DrScheme, now DrRacket, and they solved it[5] via a rediscovery of mixins.[6][7] To avoid using a programming language problem in a paper about programming languages, Krishnamurthi et al. used an old geometry programming problem to explain their pattern-oriented solution. In conversations with Felleisen and Krishnamurthi after the ECOOP presentation, Wadler understood the PL-centric nature of the problem and he pointed out that Krishnamurthi's solution used a cast to circumvent Java's type system. The discussion continued on the types mailing list, where Corky Cartwright (Rice) and Kim Bruce (Williams) showed how type systems for OO languages might eliminate this cast. In response Wadler formulated his essay and stated the challenge, "whether a language can solve the expression problem is a salient indicator of its capacity for expression." The label "expression problem" puns on expression = "how much can your language express" and expression = "the terms you are trying to represent are language expressions".
Others co-discovered variants of the expression problem around the same time as Rice University's PLT, in particular Thomas Kühne[8] in his dissertation, and Smaragdakis and Batory[9] in a parallel ECOOP 98 article.
Some follow-up work used the expression problem to showcase the power of programming language designs.[10][11]
The expression problem is also a fundamental problem in multi-dimensional Software Product Line design and in particular as an application or special case of FOSD Program Cubes.[citation needed]
Solutions
[edit]There are various solutions to the expression problem. Each solution varies in the amount of code a user must write to implement them, and the language features they require.
- Multiple dispatch[12]
- Ruby syntax § Open classes[13]
- Coproducts of functors[14]
- JavaGI type classes[15]
- Tagless-final[16] / Object algebras[17]
- Polymorphic Variants[18]
Example
[edit]Problem description
[edit]We can imagine we do not have the source code for the following library, written in C#, which we wish to extend:
interface IEvalExp
{
int Eval();
}
class Lit : IEvalExp
{
internal Lit(int n)
{
N = n;
}
internal int N { get; }
public int Eval()
{
return N;
}
}
class Add : IEvalExp
{
internal Add(IEvalExp left, IEvalExp right)
{
Left = left;
Right = right;
}
internal IEvalExp Left { get; }
internal IEvalExp Right { get; }
public int Eval()
{
return Left.Eval() + Right.Eval();
}
}
static class ExampleOne
{
static IEvalExp AddOneAndTwo() => new Add(new Lit(1), new Lit(2));
static int EvaluateTheSumOfOneAndTwo() => AddOneAndTwo().Eval();
}
Using this library we can express the arithmetic expression 1 + 2 as we did in ExampleOne.AddOneAndTwo() and can evaluate the expression by calling .Eval(). Now imagine that we wish to extend this library, adding a new type is easy because we are working with an Object-oriented programming language. For example, we might create the following class:
class Mult : IEvalExp
{
internal Mult(IEvalExp left, IEvalExp right)
{
Left = left;
Right = right;
}
internal IEvalExp Left { get; }
internal IEvalExp Right { get; }
public int Eval()
{
return Left.Eval() * Right.Eval();
}
}
However, if we wish to add a new function over the type (a new method in C# terminology), for example to pretty print an expression, we have to change the IEvalExp interface and then modify all the classes that implement the interface. Another possibility is to create a new interface that extends the IEvalExp interface and then create sub-types for Lit, Add and Mult classes, but the expression returned in ExampleOne.AddOneAndTwo() has already been compiled so we will not be able to use the new function over the old type. The problem is reversed in functional programming languages like F# where it is easy to add a function over a given type, but extending or adding types is difficult.
Problem solution using object algebra
[edit]Let us redesign the original library with extensibility in mind using the ideas from the paper Extensibility for the Masses.[17]
interface ExpAlgebra<T>
{
T Lit(int n);
T Add(T left, T right);
}
class ExpFactory : ExpAlgebra<IEvalExp>
{
public IEvalExp Lit(int n)
{
return new Lit(n);
}
public IEvalExp Add(IEvalExp left, IEvalExp right)
{
return new Add(left, right);
}
}
static class ExampleTwo<T>
{
public static T AddOneToTwo(ExpAlgebra<T> ae) => ae.Add(ae.Lit(1), ae.Lit(2));
}
We use the same implementation as in the first code example but now add a new interface containing the functions over the type as well as a factory for the algebra. Notice that we now generate the expression in ExampleTwo.AddOneToTwo() using the ExpAlgebra<T> interface instead of directly from the types. We can now add a function by extending the ExpAlgebra<T> interface, we will add functionality to print the expression:
interface IPrintExp : IEvalExp
{
string Print();
}
class PrintableLit : Lit, IPrintExp
{
internal PrintableLit(int n) : base(n)
{
N = n;
}
internal int N { get; }
public string Print()
{
return N.ToString();
}
}
class PrintableAdd : Add, IPrintExp
{
internal PrintableAdd(IPrintExp left, IPrintExp right) : base(left, right)
{
Left = left;
Right = right;
}
internal new IPrintExp Left { get; }
internal new IPrintExp Right { get; }
public string Print()
{
return Left.Print() + " + " + Right.Print();
}
}
class PrintFactory : ExpFactory, ExpAlgebra<IPrintExp>
{
public IPrintExp Add(IPrintExp left, IPrintExp right)
{
return new PrintableAdd(left, right);
}
public new IPrintExp Lit(int n)
{
return new PrintableLit(n);
}
}
static class ExampleThree
{
internal static int Evaluate() => ExampleTwo<IPrintExp>.AddOneToTwo(new PrintFactory()).Eval();
internal static string Print() => ExampleTwo<IPrintExp>.AddOneToTwo(new PrintFactory()).Print();
}
Notice that in ExampleThree.Print() we are printing an expression that was already compiled in ExampleTwo, we did not need to modify any existing code. Notice also that this is still strongly typed, we do not need reflection or casting. If we would replace the PrintFactory() with the ExpFactory() in the ExampleThree.Print() we would get a compilation error since the .Print() method does not exist in that context.
See also
[edit]References
[edit]- ^ "The Expression Problem".
- ^ Reynolds, John C. (1975). "User-defined Types and Procedural Data Structures as complementary approaches to Data Abstraction.". New Directions in Algorithmic Languages (PDF). IFIP Working Group 2.1 on Algol. pp. 157–168.
- ^ Cook, William (1990). "Object-Oriented Programming versus Abstract Data Types". In Bakker, J.W. de; Roever, W.P. de; Rozenberg, G. (eds.). Foundations of Object-Oriented Languages (FOOL), REX School/Workshop. Lecture Notes in Computer Science. Vol. 489. Noordwijkerhout, The Netherlands: Springer Berlin Heidelberg. pp. 151–178. doi:10.1007/BFb0019443. ISBN 978-3-540-46450-1.
- ^ "Synthesizing Object-Oriented and Functional Design to Promote Re-Use".
- ^ Findler, Robert Bruce; Flatt, Matthew (1999). "Modular object-oriented programming with units and mixins". ACM SIGPLAN Notices. 34: 94–104. doi:10.1145/291251.289432.
- ^ Cook, William (1989). A Denotational Semantics of Inheritance (PDF) (PhD). Brown University.
- ^ Flatt, Matthew; Krishnamurthi, Shriram; Felleisen, Matthias (1998). "Classes and Mixins". Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '98. pp. 171–183. doi:10.1145/268946.268961. ISBN 978-0897919791. S2CID 5815257.
- ^ Kühne, Thomas (1999). A Functional Pattern System for Object-Oriented Design. Darmstadt: Verlag Dr. Kovac. ISBN 978-3-86064-770-7.
- ^ Smaragdakis, Yannis; Don Batory (1998). Implementing Reusable Object-Oriented Components. Lecture Notes in Computer Science. Vol. 1445.
- ^ Zenger, Matthias; Odersky, Martin (2001). "Extensible Algebraic Datatypes with Defaults". Proceedings of the sixth ACM SIGPLAN international conference on Functional programming. pp. 241–252. CiteSeerX 10.1.1.28.6778. doi:10.1145/507635.507665. ISBN 1-58113-415-0.
- ^ Zenger, Matthias; Odersky, Martin (2005). "Independently extensible solutions to the expression problem" (PDF). FOOL 2005. ACM. CiteSeerX 10.1.1.107.4449.
- ^ Chambers, Craig; Leavens, Gary T. (November 1995). "Type Checking and Modules for Multi-Methods". ACM Transactions on Programming Languages and Systems. 17 (6): 805–843. doi:10.1145/218570.218571.
- ^ Clifton, Curtis; Leavens, Gary T.; Chambers, Craig; Millstein, Todd (2000). "MultiJava: Modular open classes and symmetric multiple dispatch for Java". Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (PDF). pp. 130–145. doi:10.1145/353171.353181. ISBN 978-1-58113-200-7. S2CID 7879645.
- ^ Wouter Swierstra (2008). "Data Types à La Carte". Journal of Functional Programming. 18 (4). Cambridge University Press: 423–436. doi:10.1017/S0956796808006758. ISSN 0956-7968. S2CID 21038598.
- ^ Wehr, Stefan; Thiemann, Peter (July 2011). "JavaGI: The Interaction of Type Classes with Interfaces and Inheritance". ACM Transactions on Programming Languages and Systems. 33 (4): 1–83. doi:10.1145/1985342.1985343. S2CID 13174506.
- ^ Carette, Jacques; Kiselyov, Oleg; Chung-chieh, Shan (2009). "Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages" (PDF). J. Funct. Program. 19 (5): 509–543. doi:10.1017/S0956796809007205. S2CID 6054319.
- ^ a b Oliveira, Bruno C. d. S.; Cook, William R. (2012). "Extensibility for the Masses: Practical Extensibility with Object Algebras" (PDF). Ecoop '12.
- ^ Garrigue, Jacques (2000). "Code Reuse Through Polymorphic Variants" (PDF). Workshop on Foundations of Software Engineering. Sasaguri, Japan, November 2000. CiteSeerX 10.1.1.128.7169.
External links
[edit]- The Expression Problem by Philip Wadler.
- Lecture: The Expression Problem by Ralf Lämmell.
- C9 Lectures: Dr. Ralf Lämmel - Advanced Functional Programming - The Expression Problem at Channel 9.[dead link]
- Independently Extensible Solutions to the Expression Problem, Matthias Zenger and Martin Odersky, EPFL Lausanne.
Expression problem
View on GrokipediaDefinition and Motivation
Core Definition
The expression problem arises in the context of statically typed programming languages, where data abstraction is achieved through mechanisms like algebraic data types or class hierarchies to encapsulate representations and behaviors while ensuring compile-time type safety. This problem highlights tensions in extending such abstractions modularly without compromising type guarantees or requiring modifications to unrelated code. At its core, the expression problem refers to the challenge of extending a data type system along two orthogonal axes: adding new data representations, such as variants or cases to existing types, and adding new operations or functions that operate on those types.[1] Representation extensibility allows for the introduction of novel cases without altering prior type definitions, while behavior extensibility enables the definition of additional functions over established types, all while preserving static type safety, modularity, and the ability to avoid recompilation of existing modules.[1] As formally articulated by Philip Wadler, the expression problem is "a new name for an old problem": the goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).[1] This formulation underscores the difficulty of achieving dual extensibility in a way that maintains the integrity of the type system across independent extensions.Importance in Language Design
The expression problem serves as a litmus test for evaluating the extensibility and modularity of programming language paradigms, particularly in statically typed systems, by revealing fundamental tensions between closed-world assumptions—such as those embodied in sealed classes or exhaustive case analysis—and open-world assumptions, like extensible interfaces or dynamic dispatch.[1] In closed-world models, the complete set of data variants is fixed at compile time, enabling optimizations like pattern matching but restricting independent extensions without recompilation or type alterations.[6] Conversely, open-world models support incremental additions but often at the cost of reduced type safety or performance, as they cannot assume completeness of cases.[6] This dichotomy underscores how language type systems balance predictability against flexibility in abstraction design.[1] Addressing the expression problem is crucial for modularity, as unresolved extensibility barriers result in brittle codebases where adding new data representations or behaviors requires pervasive modifications, leading to maintenance challenges and scalability issues.[7] Such limitations particularly impede the development of domain-specific languages (DSLs), where independent extensions for new syntax or semantics are essential, as well as plugin architectures and library evolution, which rely on composable abstractions without tight coupling.[8] In practice, this forces developers into anticipatory designs that over-engineer for unforeseen needs, reducing the adaptability of software systems over time.[7] The expression problem sharply critiques the inherent asymmetries in dominant paradigms: object-oriented programming (OOP) facilitates adding new operations to existing types through inheritance or interfaces but struggles to incorporate new types without refactoring hierarchies or violating encapsulation.[1] In contrast, functional programming (FP) excels at extending types via algebraic data types or sum types but complicates adding operations, as it typically requires updating pattern-matching constructs across dispersed modules, potentially breaking existing implementations.[1] These trade-offs highlight how OOP prioritizes behavioral extension while FP emphasizes structural growth, often at the expense of the opposite dimension.[1] Beyond critiques, the expression problem has profoundly shaped language features and practices, inspiring advancements like enhanced algebraic data types for controlled extensibility in FP and traits for decoupling types from operations in hybrid systems such as Scala.[9][10] It also reinforces the open-closed principle in software engineering, advocating for designs open to extension yet closed to modification through mechanisms like separate compilation and type-safe dispatch.[7][1]Historical Development
Origins and Early Ideas
The conceptual foundations of the expression problem trace back to early explorations in data abstraction during the 1970s and 1980s, where programming language designers grappled with the challenges of structuring and extending data types in interactive and modular systems. In procedural paradigms, such as those in languages like ALGOL or early C, extensibility was often achieved through ad-hoc modifications to global structures or functions, lacking systematic support for independent evolution of data representations and operations. Similarly, modular approaches in languages like Modula-2 emphasized encapsulation but treated extensions as language-specific mechanisms, such as separate compilation units, without addressing the tension between adding new data variants and new behaviors. These paradigms highlighted the need for disciplined abstraction but revealed inherent biases toward either centralizing data or decentralizing procedures, setting the stage for more formal inquiries into dual extensibility. A pivotal early contribution came from John C. Reynolds in 1975, who examined user-defined types and procedural data structures as complementary methods for data abstraction in interactive languages. Reynolds argued that user-defined types, which centralize representation within a single definition (e.g., a set as a disjoint union of constructors like empty, full, or limited), allow efficient extensions to the type's internal structure by modifying only the definition, while keeping operations consistent. Conversely, procedural data structures decentralize representation across functions (e.g., sets as predicates over integers), enabling novel, distributed implementations but restricting primitive operations to single-item access, thus complicating multi-argument functions like union. This duality underscored the trade-offs in extensibility: centralized types favored operational uniformity at the expense of representational flexibility, while procedural approaches promoted representational variety but hindered operational integration. Reynolds' analysis, rooted in prior work on functional abstraction, illustrated how traditional type disciplines enforced abstraction levels but struggled to balance both axes of extension without unified mechanisms.[11] Building on these ideas, William R. Cook in 1990 contrasted object-oriented programming (OOP), which he termed procedural data abstraction, with abstract data types (ADTs) to expose limitations in inheritance and interfaces for extensible designs. In OOP, as exemplified by Smalltalk's collections, inheritance facilitates adding new operations (e.g., deriving a length method from a base list class) through decentralized method dispatch, but encapsulation barriers prevent optimizations across representations, such as efficient binary operations on mixed types like intervals and streams. ADTs, typical in functional languages like ML, hide representations behind type definitions, making it straightforward to add new operations but requiring global updates to all functions when introducing new constructors (e.g., extending lists to include intervals demands revising head and tail for every case). Cook's examination of collection hierarchies revealed how OOP's subclassing often leads to inconsistent behaviors, such as methods being overridden or deleted, while ADTs impose rigidity on data evolution, demonstrating the asymmetry in supporting data versus behavioral extensions.[12] These pre-1990s developments from procedural, modular, early OOP, and functional paradigms influenced the recognition of extensibility as a core challenge in language design, where traditional type systems inherently favored extension along one dimension—either data types or operations—over the other, often resulting in brittle or ad-hoc solutions. This insight into the dual nature of the problem would later be crystallized and named by Philip Wadler in 1998.[11][12]Naming and Formalization
The term "Expression Problem" was coined by Philip Wadler in a 1998 post to the java-genericity mailing list, where he described it as "a new name for an old problem" in the context of language design challenges encountered during discussions at the ECOOP '98 conference.[1] Inspired by talks on bridging functional and object-oriented paradigms, Wadler formalized the problem as the difficulty of extending a datatype defined by cases—such as adding new cases to the datatype or new functions operating over it—without recompiling existing code and while preserving static type safety.[1] These ECOOP '98 discussions involved key figures including Shriram Krishnamurthi, Matthias Felleisen, and Daniel P. Friedman, who explored synthesizing functional programming's case-based extensibility with object-oriented inheritance through mixin-based approaches implemented in DrScheme, a precursor to the Racket programming language.[13] Wadler's post outlined an early prototype solution using Generic Java (GJ) with parametric types and virtual types, presenting Lisp-inspired pseudocode for expression datatypes like additions and multiplications alongside visitor patterns to demonstrate the extensibility barriers.[1] Concurrently, Krishnamurthi advanced prototypes for extensible visitors, enabling modular additions to expression evaluators without altering core class hierarchies, as detailed in their collaborative work on promoting reuse across paradigms.[13] A significant formalization milestone came in 2012 with William R. Cook's ECOOP paper, which revisited the Expression Problem through the lens of inheritance versus composition, proposing object algebras as a practical solution in languages like Java and C# to achieve bidirectional extensibility without recompilation.[14] Cook framed the problem as a tension between closed-world inheritance (adding behaviors to existing types) and open-world composition (adding types to existing behaviors), building on Wadler's definition to emphasize its implications for modular software design in mainstream object-oriented languages.[15]Illustrating the Problem
Classic Arithmetic Expression Example
The Expression Problem is commonly illustrated through a basic arithmetic expression system that supports literals (constant numbers) and addition as its core constructs. This base system can be defined using an algebraic data type, where expressions are represented abstractly as either a literal value, such asLit 5 for the integer 5, or an addition of two subexpressions, such as Add (Lit 1) (Lit 2) for the expression 1 + 2.[1]
A fundamental operation on these expressions is evaluation, which computes the numerical result by recursively traversing the structure. In language-agnostic pseudocode, this can be expressed using a case-based dispatch:
function eval(expression):
case expression of
Lit(n) → return n
Add(left, right) → return eval(left) + eval(right)
function eval(expression):
case expression of
Lit(n) → return n
Add(left, right) → return eval(left) + eval(right)
eval to Add (Lit 1) (Lit 2) yields 3, as it evaluates the literals and sums them.[1]
This setup establishes a self-contained foundation for handling simple arithmetic but anticipates growth, such as incorporating multiplication for expressions like 1 + 2 * 3 or adding a pretty-printing operation to render the expression as a string.[1]
Demonstrating Extensibility Barriers
To illustrate the extensibility barriers inherent in the Expression Problem, consider extending the base arithmetic expression datatype from the classic example, which typically includes literals and addition operations along with an evaluation function.[1] Adding a new representation, such as a multiplication operation denoted asMult (Lit 2) (Lit 3), requires modifying the datatype definition to include a new case, for instance, data Exp = Lit Int | Add Exp Exp | Mult Exp Exp. However, this change necessitates updating the existing evaluation function to handle the new case, such as through pattern matching:
eval :: Exp -> Int
eval (Lit n) = n
eval (Add x y) = eval x + eval y
eval (Mult x y) = eval x * eval y -- New case added
eval :: Exp -> Int
eval (Lit n) = n
eval (Add x y) = eval x + eval y
eval (Mult x y) = eval x * eval y -- New case added
print :: Exp -> String
print (Lit n) = show n
print (Add x y) = "(" ++ print x ++ " + " ++ print y ++ ")"
print :: Exp -> String
print (Lit n) = show n
print (Add x y) = "(" ++ print x ++ " + " ++ print y ++ ")"
Mult, the pretty-printing function would need an additional clause:
print (Mult x y) = "(" ++ print x ++ " * " ++ print y ++ ")"
print (Mult x y) = "(" ++ print x ++ " * " ++ print y ++ ")"
Proposed Solutions
Object-Oriented Approaches
Object-oriented approaches to the expression problem primarily rely on the visitor pattern, which leverages double dispatch to separate the addition of new behaviors from the existing data representations, allowing new operations to be defined without altering the original class hierarchy.[17] The pattern, formalized in the seminal work on design patterns, defines an acceptor interface in the base expression class and a visitor interface with type-specific methods for each operation, enabling polymorphic dispatch based on both the object type and the visitor type.[18] This structure facilitates easy extension for new operations, such as evaluation or pretty-printing, by implementing a new visitor class that traverses the expression tree and applies the desired logic at each node.[19] In a typical implementation using languages like Java or C#, the expression hierarchy starts with an abstractExp class or interface featuring subclasses such as Lit for literals and Add for binary addition, each overriding an accept method to invoke the appropriate visitor method.[17] For instance, to evaluate an expression, an IEvalVisitor interface declares methods like visitLit(Lit exp) returning the literal's value and visitAdd(Add exp) recursively evaluating and summing the operands; the eval operation is then performed by calling exp.accept(new EvalVisitor()) on the root node.[19] Adding a new operation, such as XML serialization, requires only a new visitor implementing the interface with tailored logic for each expression type, preserving the open-closed principle for behaviors while keeping the data types closed.[18]
However, the visitor pattern exhibits limitations when extending the data types themselves, as introducing a new expression variant like Mult for multiplication necessitates updating the visitor interface with a new visitMult method and modifying all existing visitor implementations to handle it, leading to widespread code changes.[17] Closed class hierarchies, enforced by features like sealed classes in C# or final classes in Java, further hinder the addition of new representations without violating encapsulation or resorting to delegation, which William Cook critiques as diverging from traditional inheritance-based object-oriented paradigms in favor of more flexible but less integrated abstract data type mechanisms.[12] Attempts to mitigate these issues, such as extensible visitors using additional dispatch layers, introduce significant complexity and runtime overhead, underscoring the pattern's asymmetry in favoring operation extension over type extension.[17]
Functional and Type-Class-Based Solutions
In functional programming languages such as Haskell and ML, the expression problem is often addressed using algebraic data types (ADTs), which facilitate straightforward extension of the data representation but impose costs on function updates. An ADT for arithmetic expressions might be defined asdata Exp = Lit Int | Add Exp Exp, where Lit represents a literal integer and Add a binary addition; adding a new constructor like Mult Exp Exp for multiplication requires only modifying the type definition, preserving type safety through exhaustive pattern matching in functions like evaluation or pretty-printing. However, extending functionality—such as introducing a new operation like differentiation—necessitates updating every existing function to handle all constructors via pattern matching, which can propagate changes across the codebase and hinder modularity.[1]
Haskell's type class system offers a partial solution by inverting this dynamic, enabling easy addition of new behaviors without altering the data types while still requiring updates for new type variants. To achieve extensibility in both directions, techniques like data types à la carte define separate types for each variant (e.g., newtype Lit = Lit Int and data Add e = Add e e), allowing type class instances for each, such as instance Eval Lit where eval (Lit i) = VInt i and instance Eval (Add e) where eval (Add e1 e2) = ... (assuming e supports Eval); variants are then composed modularly using coproducts and functors to form the full expression type. This allows new operations (e.g., a Pretty class for printing) to be added by defining a new class with instances for existing types, avoiding modifications to the individual variant definitions. Conversely, introducing a new expression type demands providing instances for all relevant classes, ensuring completeness but potentially leading to repetitive boilerplate. This approach, refined in techniques like data types à la carte, composes expression variants modularly using type classes and functors, balancing extensibility in both directions for interpreters.[20]
In dynamically typed functional languages like Scheme (and its descendant Racket), extensible variants can be implemented using structs combined with mixins, which provide a form of higher-order class composition for incremental extension without rigid hierarchies. Mixins function as class-to-class transformations that add or override methods, allowing independent extensions to be composed modularly; for instance, a base expression struct can be augmented with evaluation behavior via a mixin, and further extended with printing via another, all without recompiling core definitions. This design, introduced in early work on Scheme's object system, supports dynamic dispatch and reuse in expression-like domains, though it trades static guarantees for flexibility in untyped contexts.[21]
These functional approaches excel in type safety and compositional purity, particularly for representation extension in ADTs and behavior addition via type classes, but face trade-offs in scalability: the need to maintain exhaustive instances or matchings can result in code duplication or "expression explosion"—an proliferation of variant-specific definitions—in large, evolving systems where numerous operations interact with many constructors.[20]
Hybrid and Modern Techniques
Hybrid techniques for addressing the expression problem integrate elements from object-oriented and functional paradigms to achieve greater extensibility, allowing both new data variants and operations to be added independently without modifying existing code. Object algebras exemplify this approach by leveraging generic interfaces to define algebraic signatures for expressions, enabling modular interpretations that separate syntax from semantics.[14] In object algebras, an abstract factory interface, such asExpAlg[T], specifies the constructors for expression components, including methods like lit(int n): T for literal values and add(T left, T right): T for addition. Clients implement this interface to provide specific behaviors; for instance, an evaluation algebra might return an Int by implementing lit to return the integer value and add to perform arithmetic summation. Separately, a representation algebra, such as one building abstract syntax trees, constructs Tree nodes via the same interface. This design ensures that expressions are built uniformly while allowing multiple algebras to interpret the same syntax differently, thus solving the expression problem through parametric polymorphism and subtyping.[14][22]
Implementations in languages supporting interfaces and generics, like Scala or Java 8 and later, use traits or interfaces for ExpAlg[T]. For example, in Scala:
trait ExpAlg[T] {
def lit(n: Int): T
def add(l: T, r: T): T
}
// Evaluation algebra
class EvalAlg extends ExpAlg[Int] {
def lit(n: Int): Int = n
def add(l: Int, r: Int): Int = l + r
}
// Tree algebra
class TreeAlg extends ExpAlg[Tree] {
def lit(n: Int): Tree = Lit(n)
def add(l: Tree, r: Tree): Tree = Add(l, r)
}
trait ExpAlg[T] {
def lit(n: Int): T
def add(l: T, r: T): T
}
// Evaluation algebra
class EvalAlg extends ExpAlg[Int] {
def lit(n: Int): Int = n
def add(l: Int, r: Int): Int = l + r
}
// Tree algebra
class TreeAlg extends ExpAlg[Tree] {
def lit(n: Int): Tree = Lit(n)
def add(l: Tree, r: Tree): Tree = Add(l, r)
}
val onePlusTwo = alg.add(alg.lit(1), alg.lit(2)), and interpreted by invoking the appropriate algebra instance. This facilitates independent extensions: new operations (e.g., pretty-printing) can be added via a new algebra implementation, while new variants (e.g., multiplication) extend the interface without altering clients.[22][14]
In Rust, a systems programming language, the expression problem is addressed using enums for data variants (e.g., enum AstNode { [Integer](/page/Integer)(i64), Add([Box](/page/Box)<AstNode>, [Box](/page/Box)<AstNode>) }), traits for operations (e.g., trait [Stringify](/page/String) { fn stringify(&self) -> [String](/page/String); }), and generics with trait bounds to enable extensibility. For example, a generic function can operate on a type Node: [Stringify](/page/String) + Dump to support multiple operations without hard-coding. Trait objects ([Box](/page/Box)<dyn Trait>) allow dynamic dispatch, but solutions often require a "broker" enum type and From implementations for composition across crates. While this provides static type safety and modularity, limitations include boilerplate for crossing domains (e.g., enum to trait object) and the need for workarounds like the unstable Unsize trait for full recursion. As of 2025, Rust's approaches are discussed for their balance of performance and extensibility in safe systems programming.[23]
Other hybrid approaches include procedural embeddings using closures to encapsulate behaviors, where expression constructors return functions that defer evaluation until supplied with an interpreter, blending imperative flexibility with functional composition. Algebraic effects in languages like Koka and Eff offer another synergy, treating operations as effectful computations handled modularly, though primarily for control flow rather than direct data extensibility.[24]
Modern developments extend these hybrids to scalable domain-specific languages (DSLs) and plugin architectures, where object algebras enable modular DSL design by composing independent language fragments. For instance, in embedded DSLs for path descriptions or questionnaires, algebras allow plugins to add syntax and semantics without global recompilation, enhancing maintainability in large systems. Post-2012 advancements, however, highlight critiques of added conceptual overhead in complex scenarios, such as composing multiple algebras for intricate DSLs, potentially increasing boilerplate compared to pure paradigms. Generalized variants of the expression problem, as explored in recent analyses, reveal its manifestations in everyday programming beyond datatypes, such as record extensions, prompting robust solutions emphasizing type-safe modularity.[25][26]