Recent from talks
Nothing was collected or created yet.
Standard ML
View on Wikipedia| Standard ML | |
|---|---|
| Paradigm | Multi-paradigm: functional, imperative, modular[1] |
| Family | ML |
| First appeared | 1983[2] |
| Stable release | Standard ML '97[2]
/ 1997 |
| Typing discipline | Inferred, static, strong |
| Filename extensions | .sml |
| Website | smlfamily |
| Major implementations | |
| SML/NJ, MLton, Poly/ML | |
| Dialects | |
| Alice, Concurrent ML, Dependent ML | |
| Influenced by | |
| ML, Hope, Pascal | |
| Influenced | |
| Elm, F#, F*, Haskell, OCaml, Python,[3] Rust,[4] Scala | |
Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.
Standard ML is a modern dialect of ML, the language used in the Logic for Computable Functions (LCF) theorem-proving project. It is distinctive among widely used languages in that it has a formal specification, given as typing rules and operational semantics in The Definition of Standard ML.[5]
Language
[edit]This section has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Standard ML is a functional programming language with some impure features. Programs written in Standard ML consist of expressions in contrast to statements or commands, although some expressions of type unit are only evaluated for their side-effects.
Functions
[edit]Like all functional languages, a key feature of Standard ML is the function, which is used for abstraction. The factorial function can be expressed as follows:
fun factorial n =
if n = 0 then 1 else n * factorial (n - 1)
Type inference
[edit]An SML compiler must infer the static type val factorial : int -> int without user-supplied type annotations. It has to deduce that n is only used with integer expressions, and must therefore itself be an integer, and that all terminal expressions are integer expressions.
Declarative definitions
[edit]The same function can be expressed with clausal function definitions where the if-then-else conditional is replaced with templates of the factorial function evaluated for specific values:
fun factorial 0 = 1
| factorial n = n * factorial (n - 1)
Imperative definitions
[edit]or iteratively:
fun factorial n = let val i = ref n and acc = ref 1 in
while !i > 0 do (acc := !acc * !i; i := !i - 1); !acc
end
Lambda functions
[edit]or as a lambda function:
val rec factorial = fn 0 => 1 | n => n * factorial (n - 1)
Here, the keyword val introduces a binding of an identifier to a value, fn introduces an anonymous function, and rec allows the definition to be self-referential.
Local definitions
[edit]The encapsulation of an invariant-preserving tail-recursive tight loop with one or more accumulator parameters within an invariant-free outer function, as seen here, is a common idiom in Standard ML.
Using a local function, it can be rewritten in a more efficient tail-recursive style:
local
fun loop (0, acc) = acc
| loop (m, acc) = loop (m - 1, m * acc)
in
fun factorial n = loop (n, 1)
end
Type synonyms
[edit]A type synonym is defined with the keyword type. Here is a type synonym for points on a plane, and functions computing the distances between two points, and the area of a triangle with the given corners as per Heron's formula. (These definitions will be used in subsequent examples).
type loc = real * real
fun square (x : real) = x * x
fun dist (x, y) (x', y') =
Math.sqrt (square (x' - x) + square (y' - y))
fun heron (a, b, c) = let
val x = dist a b
val y = dist b c
val z = dist a c
val s = (x + y + z) / 2.0
in
Math.sqrt (s * (s - x) * (s - y) * (s - z))
end
Algebraic datatypes
[edit]Standard ML provides strong support for algebraic datatypes (ADT). A data type can be thought of as a disjoint union of tuples (or a "sum of products"). They are easy to define and easy to use, largely because of pattern matching, and most Standard ML implementations' pattern-exhaustiveness checking and pattern redundancy checking.
In object-oriented programming languages, a disjoint union can be expressed as class hierarchies. However, in contrast to class hierarchies, ADTs are closed. Thus, the extensibility of ADTs is orthogonal to the extensibility of class hierarchies. Class hierarchies can be extended with new subclasses which implement the same interface, while the functions of ADTs can be extended for the fixed set of constructors. See expression problem.
A datatype is defined with the keyword datatype, as in:
datatype shape
= Circle of loc * real (* center and radius *)
| Square of loc * real (* upper-left corner and side length; axis-aligned *)
| Triangle of loc * loc * loc (* corners *)
Note that a type synonym cannot be recursive; datatypes are necessary to define recursive constructors. (This is not at issue in this example.)
Pattern matching
[edit]Patterns are matched in the order in which they are defined. C programmers can use tagged unions, dispatching on tag values, to do what ML does with datatypes and pattern matching. Nevertheless, while a C program decorated with appropriate checks will, in a sense, be as robust as the corresponding ML program, those checks will of necessity be dynamic; ML's static checks provide strong guarantees about the correctness of the program at compile time.
Function arguments can be defined as patterns as follows:
fun area (Circle (_, r)) = Math.pi * square r
| area (Square (_, s)) = square s
| area (Triangle p) = heron p (* see above *)
The so-called "clausal form" of function definition, where arguments are defined as patterns, is merely syntactic sugar for a case expression:
fun area shape = case shape of
Circle (_, r) => Math.pi * square r
| Square (_, s) => square s
| Triangle p => heron p
Exhaustiveness checking
[edit]Pattern-exhaustiveness checking will make sure that each constructor of the datatype is matched by at least one pattern.
The following pattern is not exhaustive:
fun center (Circle (c, _)) = c
| center (Square ((x, y), s)) = (x + s / 2.0, y + s / 2.0)
There is no pattern for the Triangle case in the center function. The compiler will issue a warning that the case expression is not exhaustive, and if a Triangle is passed to this function at runtime, exception Match will be raised.
Redundancy checking
[edit]The pattern in the second clause of the following (meaningless) function is redundant:
fun f (Circle ((x, y), r)) = x + y
| f (Circle _) = 1.0
| f _ = 0.0
Any value that would match the pattern in the second clause would also match the pattern in the first clause, so the second clause is unreachable. Therefore, this definition as a whole exhibits redundancy, and causes a compile-time warning.
The following function definition is exhaustive and not redundant:
val hasCorners = fn (Circle _) => false | _ => true
If control gets past the first pattern (Circle), we know the shape must be either a Square or a Triangle. In either of those cases, we know the shape has corners, so we can return true without discerning the actual shape.
Higher-order functions
[edit]Functions can consume functions as arguments:
fun map f (x, y) = (f x, f y)
Functions can produce functions as return values:
fun constant k = (fn _ => k)
Functions can also both consume and produce functions:
fun compose (f, g) = (fn x => f (g x))
The function List.map from the basis library is one of the most commonly used higher-order functions in Standard ML:
fun map _ [] = []
| map f (x :: xs) = f x :: map f xs
A more efficient implementation with tail-recursive List.foldl:
fun map f = List.rev o List.foldl (fn (x, acc) => f x :: acc) []
Exceptions
[edit]Exceptions are raised with the keyword raise and handled with the pattern matching handle construct. The exception system can implement non-local exit; this optimization technique is suitable for functions like the following.
local
exception Zero;
val p = fn (0, _) => raise Zero | (a, b) => a * b
in
fun prod xs = List.foldl p 1 xs handle Zero => 0
end
When exception Zero is raised, control leaves the function List.foldl altogether. Consider the alternative: the value 0 would be returned, it would be multiplied by the next integer in the list, the resulting value (inevitably 0) would be returned, and so on. The raising of the exception allows control to skip over the entire chain of frames and avoid the associated computation. Note the use of the underscore (_) as a wildcard pattern.
The same optimization can be obtained with a tail call.
local
fun p a (0 :: _) = 0
| p a (x :: xs) = p (a * x) xs
| p a [] = a
in
val prod = p 1
end
Module system
[edit]Standard ML's advanced module system allows programs to be decomposed into hierarchically organized structures of logically related type and value definitions. Modules provide not only namespace control but also abstraction, in the sense that they allow the definition of abstract data types. Three main syntactic constructs comprise the module system: signatures, structures and functors.
Signatures
[edit]A signature is an interface, usually thought of as a type for a structure; it specifies the names of all entities provided by the structure, the arity of each type component, the type of each value component, and the signature of each substructure. The definitions of type components are optional; type components whose definitions are hidden are abstract types.
For example, the signature for a queue may be:
signature QUEUE = sig
type 'a queue
exception QueueError;
val empty : 'a queue
val isEmpty : 'a queue -> bool
val singleton : 'a -> 'a queue
val fromList : 'a list -> 'a queue
val insert : 'a * 'a queue -> 'a queue
val peek : 'a queue -> 'a
val remove : 'a queue -> 'a * 'a queue
end
This signature describes a module that provides a polymorphic type 'a queue, exception QueueError, and values that define basic operations on queues.
Structures
[edit]A structure is a module; it consists of a collection of types, exceptions, values and structures (called substructures) packaged together into a logical unit.
A queue structure can be implemented as follows:
structure TwoListQueue :> QUEUE = struct
type 'a queue = 'a list * 'a list
exception QueueError;
val empty = ([], [])
fun isEmpty ([], []) = true
| isEmpty _ = false
fun singleton a = ([], [a])
fun fromList a = ([], a)
fun insert (a, ([], [])) = singleton a
| insert (a, (ins, outs)) = (a :: ins, outs)
fun peek (_, []) = raise QueueError
| peek (ins, outs) = List.hd outs
fun remove (_, []) = raise QueueError
| remove (ins, [a]) = (a, ([], List.rev ins))
| remove (ins, a :: outs) = (a, (ins, outs))
end
This definition declares that structure TwoListQueue implements signature QUEUE. Furthermore, the opaque ascription denoted by :> states that any types which are not defined in the signature (i.e. type 'a queue) should be abstract, meaning that the definition of a queue as a pair of lists is not visible outside the module. The structure implements all of the definitions in the signature.
The types and values in a structure can be accessed with "dot notation":
val q : string TwoListQueue.queue = TwoListQueue.empty
val q' = TwoListQueue.insert (Real.toString Math.pi, q)
Functors
[edit]A functor is a function from structures to structures; that is, a functor accepts one or more arguments, which are usually structures of a given signature, and produces a structure as its result. Functors are used to implement generic data structures and algorithms.
One popular algorithm[6] for breadth-first search of trees makes use of queues. Here is a version of that algorithm parameterized over an abstract queue structure:
(* after Okasaki, ICFP, 2000 *)
functor BFS (Q: QUEUE) = struct
datatype 'a tree = E | T of 'a * 'a tree * 'a tree
local
fun bfsQ q = if Q.isEmpty q then [] else search (Q.remove q)
and search (E, q) = bfsQ q
| search (T (x, l, r), q) = x :: bfsQ (insert (insert q l) r)
and insert q a = Q.insert (a, q)
in
fun bfs t = bfsQ (Q.singleton t)
end
end
structure QueueBFS = BFS (TwoListQueue)
Within functor BFS, the representation of the queue is not visible. More concretely, there is no way to select the first list in the two-list queue, if that is indeed the representation being used. This data abstraction mechanism makes the breadth-first search truly agnostic to the queue's implementation. This is in general desirable; in this case, the queue structure can safely maintain any logical invariants on which its correctness depends behind the bulletproof wall of abstraction.
Code examples
[edit]Snippets of SML code are most easily studied by entering them into an interactive top-level.
Hello, world!
[edit]The following is a "Hello, World!" program:
| hello.sml |
|---|
print "Hello, world!\n";
|
| sh |
$ mlton hello.sml
$ ./hello
Hello, world!
|
Algorithms
[edit]Insertion sort
[edit]Insertion sort for int list (ascending) can be expressed concisely as follows:
fun insert (x, []) = [x] | insert (x, h :: t) = sort x (h, t)
and sort x (h, t) = if x < h then [x, h] @ t else h :: insert (x, t)
val insertionsort = List.foldl insert []
Mergesort
[edit]Here, the classic mergesort algorithm is implemented in three functions: split, merge and mergesort. Also note the absence of types, with the exception of the syntax op :: and [] which signify lists. This code will sort lists of any type, so long as a consistent ordering function cmp is defined. Using Hindley–Milner type inference, the types of all variables can be inferred, even complicated types such as that of the function cmp.
Split
fun split is implemented with a stateful closure which alternates between true and false, ignoring the input:
fun alternator {} = let val state = ref true
in fn a => !state before state := not (!state) end
(* Split a list into near-halves which will either be the same length,
* or the first will have one more element than the other.
* Runs in O(n) time, where n = |xs|.
*)
fun split xs = List.partition (alternator {}) xs
Merge
Merge uses a local function loop for efficiency. The inner loop is defined in terms of cases: when both lists are non-empty (x :: xs) and when one list is empty ([]).
This function merges two sorted lists into one sorted list. Note how the accumulator acc is built backwards, then reversed before being returned. This is a common technique, since 'a list is represented as a linked list; this technique requires more clock time, but the asymptotics are not worse.
(* Merge two ordered lists using the order cmp.
* Pre: each list must already be ordered per cmp.
* Runs in O(n) time, where n = |xs| + |ys|.
*)
fun merge cmp (xs, []) = xs
| merge cmp (xs, y :: ys) = let
fun loop (a, acc) (xs, []) = List.revAppend (a :: acc, xs)
| loop (a, acc) (xs, y :: ys) =
if cmp (a, y)
then loop (y, a :: acc) (ys, xs)
else loop (a, y :: acc) (xs, ys)
in
loop (y, []) (ys, xs)
end
Mergesort
The main function:
fun ap f (x, y) = (f x, f y)
(* Sort a list in according to the given ordering operation cmp.
* Runs in O(n log n) time, where n = |xs|.
*)
fun mergesort cmp [] = []
| mergesort cmp [x] = [x]
| mergesort cmp xs = (merge cmp o ap (mergesort cmp) o split) xs
Quicksort
[edit]Quicksort can be expressed as follows. fun part is a closure that consumes an order operator op <<.
infix <<
fun quicksort (op <<) = let
fun part p = List.partition (fn x => x << p)
fun sort [] = []
| sort (p :: xs) = join p (part p xs)
and join p (l, r) = sort l @ p :: sort r
in
sort
end
Expression interpreter
[edit]Note the relative ease with which a small expression language can be defined and processed:
exception TyErr;
datatype ty = IntTy | BoolTy
fun unify (IntTy, IntTy) = IntTy
| unify (BoolTy, BoolTy) = BoolTy
| unify (_, _) = raise TyErr
datatype exp
= True
| False
| Int of int
| Not of exp
| Add of exp * exp
| If of exp * exp * exp
fun infer True = BoolTy
| infer False = BoolTy
| infer (Int _) = IntTy
| infer (Not e) = (assert e BoolTy; BoolTy)
| infer (Add (a, b)) = (assert a IntTy; assert b IntTy; IntTy)
| infer (If (e, t, f)) = (assert e BoolTy; unify (infer t, infer f))
and assert e t = unify (infer e, t)
fun eval True = True
| eval False = False
| eval (Int n) = Int n
| eval (Not e) = if eval e = True then False else True
| eval (Add (a, b)) = (case (eval a, eval b) of (Int x, Int y) => Int (x + y))
| eval (If (e, t, f)) = eval (if eval e = True then t else f)
fun run e = (infer e; SOME (eval e)) handle TyErr => NONE
Example usage on well-typed and ill-typed expressions:
val SOME (Int 3) = run (Add (Int 1, Int 2)) (* well-typed *)
val NONE = run (If (Not (Int 1), True, False)) (* ill-typed *)
Arbitrary-precision integers
[edit]The IntInf module provides arbitrary-precision integer arithmetic. Moreover, integer literals may be used as arbitrary-precision integers without the programmer having to do anything.
The following program implements an arbitrary-precision factorial function:
| fact.sml |
|---|
fun fact n : IntInf.int = if n = 0 then 1 else n * fact (n - 1);
fun printLine str = TextIO.output (TextIO.stdOut, str ^ "\n");
val () = printLine (IntInf.toString (fact 120));
|
| bash |
$ mlton fact.sml
$ ./fact
6689502913449127057588118054090372586752746333138029810295671352301
6335572449629893668741652719849813081576378932140905525344085894081
21859898481114389650005964960521256960000000000000000000000000000
|
Partial application
[edit]Curried functions have many applications, such as eliminating redundant code. For example, a module may require functions of type a -> b, but it is more convenient to write functions of type a * c -> b where there is a fixed relationship between the objects of type a and c. A function of type c -> (a * c -> b) -> a -> b can factor out this commonality. This is an example of the adapter pattern.[citation needed]
In this example, fun d computes the numerical derivative of a given function f at point x:
- fun d delta f x = (f (x + delta) - f (x - delta)) / (2.0 * delta)
val d = fn : real -> (real -> real) -> real -> real
The type of fun d indicates that it maps a "float" onto a function with the type (real -> real) -> real -> real. This allows us to partially apply arguments, known as currying. In this case, function d can be specialised by partially applying it with the argument delta. A good choice for delta when using this algorithm is the cube root of the machine epsilon.[citation needed]
- val d' = d 1E~8;
val d' = fn : (real -> real) -> real -> real
The inferred type indicates that d' expects a function with the type real -> real as its first argument. We can compute an approximation to the derivative of at . The correct answer is .
- d' (fn x => x * x * x - x - 1.0) 3.0;
val it = 25.9999996644 : real
Libraries
[edit]Standard
[edit]The Basis Library[7] has been standardized and ships with most implementations. It provides modules for trees, arrays, and other data structures, and input/output and system interfaces.
Third party
[edit]For numerical computing, a Matrix module exists (but is currently broken), https://www.cs.cmu.edu/afs/cs/project/pscico/pscico/src/matrix/README.html.
For graphics, cairo-sml is an open source interface to the Cairo graphics library. For machine learning, a library for graphical models exists.
Implementations
[edit]Implementations of Standard ML include the following:
Standard
- HaMLet: a Standard ML interpreter that aims to be an accurate and accessible reference implementation of the standard
- MLton (mlton.org): a whole-program optimizing compiler which strictly conforms to the Definition and produces very fast code compared to other ML implementations, including backends for LLVM and C
- Moscow ML: a light-weight implementation, based on the Caml Light runtime engine which implements the full Standard ML language, including modules and much of the basis library
- Poly/ML: a full implementation of Standard ML that produces fast code and supports multicore hardware (via Portable Operating System Interface (POSIX) threads); its runtime system performs parallel garbage collection and online sharing of immutable substructures.
- Standard ML of New Jersey (smlnj.org): a full compiler, with associated libraries, tools, an interactive shell, and documentation with support for Concurrent ML
- SML.NET: a Standard ML compiler for the Common Language Runtime with extensions for linking with other .NET framework code
- ML Kit Archived 2016-01-07 at the Wayback Machine: an implementation based very closely on the Definition, integrating a garbage collector (which can be disabled) and region-based memory management with automatic inference of regions, aiming to support real-time applications
Derivative
- Alice: an interpreter for Standard ML by Saarland University with support for parallel programming using futures, lazy evaluation, distributed computing via remote procedure calls and constraint programming
- SML#: an extension of SML providing record polymorphism and C language interoperability. It is a conventional native compiler and its name is not an allusion to running on the .NET framework
- SOSML: an implementation written in TypeScript, supporting most of the SML language and select parts of the basis library
Research
- CakeML is a REPL version of ML with formally verified runtime and translation to assembler.
- Isabelle (Isabelle/ML Archived 2020-08-30 at the Wayback Machine) integrates parallel Poly/ML into an interactive theorem prover, with a sophisticated IDE (based on jEdit) for official Standard ML (SML'97), the Isabelle/ML dialect, and the proof language. Starting with Isabelle2016, there is also a source-level debugger for ML.
- Poplog implements a version of Standard ML, along with Common Lisp and Prolog, allowing mixed language programming; all are implemented in POP-11, which is compiled incrementally.
- TILT is a full certifying compiler for Standard ML which uses typed intermediate languages to optimize code and ensure correctness, and can compile to typed assembly language.
All of these implementations are open-source and freely available. Most are implemented themselves in Standard ML. There are no longer any commercial implementations; Harlequin, now defunct, once produced a commercial IDE and compiler called MLWorks which passed on to Xanalys and was later open-sourced after it was acquired by Ravenbrook Limited on April 26, 2013.
Major projects using SML
[edit]The IT University of Copenhagen's entire enterprise architecture is implemented in around 100,000 lines of SML, including staff records, payroll, course administration and feedback, student project management, and web-based self-service interfaces.[8]
The proof assistants HOL4, Isabelle, LEGO, and Twelf are written in Standard ML. It is also used by compiler writers and integrated circuit designers such as ARM.[9]
See also
[edit]References
[edit]- ^ a b "Programming in Standard ML: Hierarchies and Parameterization". Retrieved 2020-02-22.
- ^ a b c "SML '97". www.smlnj.org.
- ^ a b "itertools — Functions creating iterators for efficient looping — Python 3.7.1rc1 documentation". docs.python.org.
- ^ "Influences - The Rust Reference". The Rust Reference. Retrieved 2023-12-31.
- ^ a b Milner, Robin; Tofte, Mads; Harper, Robert; MacQueen, David (1997). The Definition of Standard ML (Revised). MIT Press. ISBN 0-262-63181-4.
- ^ a b Okasaki, Chris (2000). "Breadth-First Numbering: Lessons from a Small Exercise in Algorithm Design". International Conference on Functional Programming 2000. ACM.
- ^ "Standard ML Basis Library". smlfamily.github.io. Retrieved 2022-01-10.
- ^ a b Tofte, Mads (2009). "Standard ML language". Scholarpedia. 4 (2): 7515. Bibcode:2009SchpJ...4.7515T. doi:10.4249/scholarpedia.7515.
- ^ a b Alglave, Jade; Fox, Anthony C. J.; Ishtiaq, Samin; Myreen, Magnus O.; Sarkar, Susmit; Sewell, Peter; Nardelli, Francesco Zappa (2009). The Semantics of Power and ARM Multiprocessor Machine Code (PDF). DAMP 2009. pp. 13–24. Archived (PDF) from the original on 2017-08-14.
External links
[edit]About Standard ML
- Revised definition
- Standard ML Family GitHub Project Archived 2020-02-20 at the Wayback Machine
- What is SML?
- What is SML '97?
About successor ML
- successor ML (sML): evolution of ML using Standard ML as a starting point
- HaMLet on GitHub: reference implementation for successor ML
Practical
Academic
Standard ML
View on GrokipediaHistory
Origins in the ML Language Family
The ML language family originated in the 1970s at the University of Edinburgh as the Meta Language (ML) for the LCF (Logic for Computable Functions) theorem prover, a system designed to support interactive proof construction with a focus on logical security and mechanized reasoning.[4] Developed under the leadership of Robin Milner, the project began around 1973 when Milner, then at Edinburgh, adapted and extended an earlier LCF implementation from Stanford to create a more robust framework for higher-order logic proofs. The initial ML was an untyped language inspired by the lambda calculus and ISWIM, serving primarily as a scripting tool for defining proof tactics within LCF.[4] Key contributors to ML's early design included Milner, along with Michael J. C. Gordon, Christopher P. Wadsworth, and others in the Edinburgh group, who formalized ML's structure in their 1979 work on LCF. A pivotal innovation during 1973–1978 was the introduction of polymorphic type inference, independently rediscovered by Milner and presented in his 1978 paper, which established the Hindley-Milner algorithm for principal type schemes and let-polymorphism.[5] This advancement transformed ML from an untyped meta-language into a statically typed functional language, emphasizing type safety while preserving expressiveness for theorem-proving tasks.[4] In the 1980s, ML evolved into standalone typed variants as its utility extended beyond LCF. Edinburgh ML emerged in 1981 through Luca Cardelli's development of a compiler for a free-standing version, initially called "ML under VMS," which decoupled ML from the LCF system and enabled broader experimentation.[4] Concurrently, Cambridge ML was developed at the University of Cambridge, building on Edinburgh's foundations to support proof assistants like HOL and later influencing the creation of Caml in 1985.[4] These implementations marked ML's transition to a general-purpose functional programming language. The rapid proliferation of ML dialects, including LCF/ML, Cardelli ML, Poly/ML (1988), and others, led to significant divergences in syntax, semantics, and features, resulting in portability issues that hindered collaborative development and widespread adoption.[4] Without a unified specification, programs written for one implementation often failed in another, prompting initial efforts toward standardization in the early 1980s.[4]Standardization Process
The standardization of Standard ML emerged from efforts within the functional programming community to unify divergent implementations of the ML language family, with roots in discussions under the IFIP Working Group 2.1 as early as 1983.[6] In April 1983, Robin Milner drafted an initial proposal for a "Standard ML," emphasizing a core language design that would serve as a portable foundation.[4] This led to the formal formation of the SML Definition Committee following a pivotal meeting in Edinburgh from May 23–25, 1985, chaired by Milner and including key contributors such as Mads Tofte, Robert Harper, and David MacQueen.[6] The committee operated under the auspices of IFIP Working Group 2.1, which focused on algorithmic languages and provided a structured framework for defining programming languages, drawing on the British research tradition centered in Edinburgh.[4] Key milestones in the standardization process included the release of draft definitions that refined the language's semantics. In March 1986, the committee published a preliminary draft as Technical Report ECS-LFCS-86-2 from the University of Edinburgh.[6] This was followed in August 1987 by another draft, ECS-LFCS-87-36, which introduced formal static semantics and principal signatures to ensure type safety and inference.[4] These drafts facilitated community feedback and iteration, culminating in the final Definition of Standard ML, published in 1990 by MIT Press and authored by Milner, Tofte, Harper, and MacQueen.[7] The 1990 standard initially encompassed the core language, deliberately excluding modules to focus on foundational constructs like expressions, types, and functions, though the module system was integrated into the final document to support modular programming.[6] This scope aimed to define a minimal, consistent subset that could be implemented portably across diverse systems, addressing fragmentation in earlier ML variants.[4] The primary rationale was to promote interoperability and reliability, enabling developers to write code that would behave uniformly in implementations such as the Standard ML of New Jersey (SML/NJ) and others, thereby fostering wider adoption in research and education.[6]Revisions and Modern Developments
In 1997, the Standard ML community released a revised definition of the language, known as SML '97, which integrated the module system more deeply into the core language and provided clearer specifications of its semantics to address ambiguities in the original 1990 standard.[8][9] This revision, detailed in The Definition of Standard ML (Revised) by Milner, Tofte, Harper, and MacQueen, simplified certain aspects of the language while maintaining backward compatibility, ensuring that existing programs could run unchanged.[10] Accompanying the revised definition, the Standard ML Basis Library was formalized in 1997 as a comprehensive specification of predefined modules, promoting portability across implementations by standardizing essential functionality for systems and applications programming.[11] This library, documented in The Standard ML Basis Library by Gansner and Reppy, includes modules for data structures, I/O, and time handling, among others, and has been integral to the language's ecosystem since its adoption.[12] Since the 1997 revision, the core language has seen no major changes, with development focusing instead on implementation extensions and community-driven enhancements. Efforts to develop a successor like ML2000 in the late 1990s stalled, contributing to the stability of the core language since 1997.[3] The ML Family Workshop series continues to foster this activity, with events in 2024 (held September 6 in Milan, Italy, co-located with ICFP) and 2025 (held October 16 in Singapore, co-located with ICFP/SPLASH 2025) discussing advancements in ML-family languages, including practical extensions for Standard ML.[13] The SML community maintains the language through collaborative efforts, such as the SML Family GitHub repository, with ongoing support for implementations like SML/NJ. As of November 4, 2025, the latest SML/NJ release (version 110.99.9) is a bug-fix update, emphasizing stability over new features.[14]Language Design
Core Principles and Philosophy
Standard ML (SML) is a general-purpose functional programming language designed with the foundational goals of achieving high expressiveness while ensuring safety and correctness through rigorous formal semantics.[6] Its philosophy emphasizes a balance between theoretical foundations and practical utility, drawing directly from lambda calculus and type theory to support higher-order functions and abstract mathematical structures.[15] This approach, inspired by early systems like ISWIM and LCF, prioritizes a declarative paradigm where programs are composed as expressions rather than imperative statements, promoting clarity and reducing unintended side effects.[6] Central to SML's core principles is its strong static typing system, which leverages the Hindley-Milner framework to provide automatic type inference and guarantee type safety without runtime checks.[15] Polymorphism, particularly parametric polymorphism, enables generic code that operates uniformly across types, enhancing reusability while the value restriction mechanism preserves soundness in the presence of imperative features.[6] Referential transparency is a key tenet, ensuring that expressions evaluate predictably based on their values alone, which fosters correctness by minimizing mutable state dependencies and aiding in reasoning about program behavior.[16] SML's design philosophy extends to modularity and abstraction, facilitating the construction of large-scale software through a sophisticated module system comprising signatures, structures, and functors.[17] Functors act as parameterized modules, allowing reusable components that abstract over types and values, thereby supporting hierarchical organization and information hiding without compromising the language's type-safe expressiveness.[6] Overall, these principles reflect SML's commitment to a secure, general-purpose language where formal verification aligns with intuitive programming practices.[16]Type System Fundamentals
Standard ML employs the Hindley-Milner type system, which enables automatic type inference for expressions without requiring explicit type annotations, ensuring that every well-typed program has a principal type scheme that is the most general possible. This system, foundational to the language's design, guarantees that type checking is decidable and efficient, allowing compilation without runtime type errors. The core inference mechanism is based on Algorithm W, which computes principal types through a process of substitution and unification, as originally formalized for functional programs.[18][8] Algorithm W performs type inference by recursively assigning fresh type variables to subexpressions and unifying them to resolve constraints arising from operations like function application. Unification, drawing from Robinson's algorithm, matches types by finding substitutions that make them identical, while incorporating an occurs check to prevent infinite type expansions, such as a type variable occurring within the type it is being unified with. For instance, in inferring the type of the identity function , Algorithm W assigns a fresh variable to , yielding the monotype , which is then generalized to the polymorphic scheme . This process ensures completeness and soundness, meaning every typable expression receives its principal type, and no incorrect typings are inferred.[18] Polymorphic types in Standard ML are expressed through universal quantification over type variables, forming type schemes that capture generality, such as for functions that work uniformly across types. Type variables, denoted by primes like'a or 'b, represent unknowns that can be instantiated to concrete types during use, while a type scheme binds these variables universally to prevent ad-hoc polymorphism and enforce parametricity. In the static semantics, environments map identifiers to such schemes, allowing polymorphic values like list constructors to be reused with different element types without duplication.[8][18]
The type system provides static guarantees that eliminate runtime type errors, as all type inconsistencies are detected at compile time through exhaustive inference and checking. Equality types, marked with an equality attribute (e.g., 'a with equality), restrict polymorphism to types supporting structural equality, such as integers or datatypes, enabling operations like comparison only where defined. Abstract types, introduced via abstype declarations, hide implementation details by treating the type as opaque, exposing only specified operations and preventing direct construction or inspection outside the abstraction. These features ensure modularity and safety without compromising the language's expressiveness.[8]
Type equivalence in Standard ML is structural for tuples and records, where two types are equal if their components match recursively by structure, allowing flexible composition without name dependencies. In contrast, datatypes employ nominal equivalence, where equality is determined by the datatype name and constructor definitions, preserving identity even if structures coincide, which supports distinct abstract interpretations of similar layouts. This hybrid approach balances expressivity with the need for identifiable types in polymorphic contexts.[8]
Basic Constructs
Declarations and Expressions
In Standard ML, programs are constructed from declarations and expressions, forming the foundational building blocks of the language. Declarations introduce bindings for values and functions, while expressions represent computations that evaluate to values. Unlike imperative languages, Standard ML treats the entire program as a sequence of pure expressions without statements, emphasizing functional composition and type safety.[8] Value declarations use theval keyword to bind immutable identifiers to expressions, following the syntax val pat = exp, where pat is a pattern (typically a simple identifier for basic bindings) and exp is an expression. These bindings are recursive if prefixed with rec, allowing mutual definitions within a single declaration, as in val rec x = e where e may reference x. Optional type variables can precede the binding, such as val 'a x = e, to explicitly scope polymorphic types to that declaration. Function declarations employ the fun keyword with syntax fun f pat1 ... patn = exp, defining a function f that takes arguments matching the patterns and returns the value of exp; multiple clauses can be chained with | for pattern alternatives, and the rec modifier enables recursive or mutually recursive functions. These forms ensure all bindings are statically checked for type consistency during elaboration.[8]
Expressions in Standard ML are categorized into atomic and compound forms. Atomic expressions include literals like integers (42), strings ("hello"), and identifiers (x), as well as constructs like record literals {field = value} or tuple constructions. Compound expressions build upon these through application (exp1 exp2), type ascription (exp : ty), or more complex forms such as conditional if exp1 then exp2 else exp3, which evaluates to exp2 if exp1 is true and exp3 otherwise. The let expression, let dec in exp end, introduces local declarations dec whose bindings are visible only within exp, providing a mechanism for scoped variable introduction without global pollution. Additionally, case exp of match dispatches on the value of exp using pattern matches, though basic usage here focuses on simple exhaustive cases without advanced destructuring. All expressions are pure, meaning they have no side effects unless explicitly involving mutable structures.[8]
Scoping in Standard ML follows lexical rules, where bindings introduced by declarations are visible from their point of definition onward, unless nested within a let expression, which creates a new environment for its body. In a let dec in exp end, the declarations in dec shadow outer bindings but do not affect the surrounding scope, enforcing hygiene and preventing unintended captures. Type variables in value or function bindings are explicitly scoped to that declaration, avoiding leakage into broader contexts. This design supports modular programming by isolating local computations.[8]
Standard ML programs are structured as sequences of top-level declarations, either in source files or interactively via a read-eval-print loop (REPL). A file consists of declarations separated implicitly by their forms, with the entire program evaluating as the sequence of bound values; there are no explicit statements or control flow outside expressions. In the REPL, each declaration is elaborated and evaluated immediately, binding new identifiers into the persistent environment. This expression-centric model, devoid of imperative statements, aligns with the language's functional paradigm, where computation proceeds through value substitutions.[8]
For example, a simple program might declare a value and use it in an expression:
val pi = 3.14159;
val circumference = fn r => 2.0 * pi * r;
circumference 5.0;
val pi = 3.14159;
val circumference = fn r => 2.0 * pi * r;
circumference 5.0;
pi is bound immutably, circumference defines a function, and the final application evaluates to approximately 31.4159, demonstrating how declarations feed into expressive computations. Type inference automatically deduces types like real for these reals, though explicit annotations can be added for clarity.[8]
Functions and Lambda Expressions
In Standard ML, functions are first-class citizens, meaning they can be defined anonymously, passed as arguments, returned as results, and stored in data structures, enabling powerful functional abstractions.[8] The core mechanism for creating functions is the lambda abstraction, which allows for concise expression of computations without naming. Anonymous functions are defined using the syntaxfn pat => exp, where pat is a pattern (often a simple variable) and exp is the body expression; this evaluates to a value of function type, represented internally as a closure containing the pattern matches, the static environment, and an empty dynamic environment.[8] For example, fn x => x * 2 defines a function that doubles its integer input, with inferred type int -> int.[19]
Standard ML employs currying by default, transforming multi-argument functions into a chain of single-argument functions of type τ₁ → (τ₂ → ... → τₙ → τ), which facilitates partial application.[8] Thus, a function like fun add x y = x + y has type int -> int -> int, and applying it partially as (add 3) yields a new function of type int -> int that adds 3 to its argument.[19] This implicit currying supports flexible composition, where add 3 4 evaluates to 7, but add 3 can be bound to a variable for reuse.[8]
Recursive functions are introduced using the rec keyword in value bindings, allowing a function to refer to itself within its definition. The syntax fun rec f pat = exp (or equivalently val rec f = fn pat => exp) binds f to a recursive closure, where the expression must be a lambda abstraction to enable self-reference.[8] For instance, fun rec factorial n = if n = 0 then 1 else n * factorial (n - 1) computes the factorial with base case and recursive step, inferring type int -> int.[19] Mutual recursion between multiple functions is supported by chaining bindings with and, such as val rec even = fn n => n = 0 orelse odd (n - 1) and odd = fn n => n <> 0 andalso even (n - 1), enabling definitions where functions call each other.[8]
Function application in Standard ML uses strict call-by-value semantics, where arguments are fully evaluated from left to right before substitution into the function body.[8] This ensures deterministic behavior and side-effect ordering, as in (fun f x = x + 1) (2 + 3), where 2 + 3 evaluates to 5 prior to application, yielding 6.[19] Unlike lazy evaluation, this strict strategy prevents unevaluated thunks, promoting efficiency for most computations while requiring explicit handling for infinite data structures if needed.[8]
Built-in Types and Type Synonyms
Standard ML provides a set of primitive built-in types that form the foundation of its type system, including the monomorphic typesint for integers, real for floating-point numbers, string for sequences of characters, bool for boolean values (true and false), char for individual characters, and unit as a singleton type with value ().[8] These types have arity zero and are part of the initial static basis, with int, bool, string, char, and unit admitting equality comparisons via the = and <> operators, while real does not due to its non-discrete nature.[8]
Among the built-in types, lists and tuples are constructed using predefined type constructors and syntactic sugar for convenience. Lists are parameterized by the type constructor list of arity one, yielding types like 'a list for homogeneous sequences, constructed via the nil value for empty lists and the infix :: operator for cons cells, with derived syntax [e1; e2; ...; en] for list literals where all elements share the same type.[8] Tuples, on the other hand, are derived from record types and expressed as (e1, e2, ..., en) for n ≥ 2, equivalent to labeled records {1=e1, 2=e2, ..., n=en} with fixed, anonymous labels, allowing heterogeneous components of arbitrary types.[8] Both lists and tuples admit equality if all their component types do, enabling structural comparisons and pattern matching on equality types.[8]
Type synonyms in Standard ML allow developers to create aliases for existing types without introducing new type constructors, using the declaration type tyvarseq tycon = ty to name complex types transparently.[8] For example, one might define type intlist = int list to abbreviate lists of integers, or more generally type 'a slist = 'a * 'a list for a structure pairing an element with a list; such synonyms are fully interchangeable with their underlying types and preserve properties like equality admissibility.[8] This mechanism supports polymorphism by allowing parametric synonyms, such as type 'a seq = 'a list, facilitating readable code without altering the type system's structure.[8]
Equality types play a crucial role in Standard ML, denoting those types for which structural equality is defined and usable in expressions like e1 = e2 or in pattern matching, including all primitive types except real and built-in constructors like list and tuples when their arguments admit equality.[8] This distinction ensures safe operations, as non-equality types like real avoid imprecise floating-point comparisons in patterns or equality tests.[8]
Built-in type constructors extend the primitives, notably ref of arity one for creating mutable references with type 'a ref, which admits equality only if the underlying type 'a does, enabling imperative features like assignment while integrating with the functional core.[8] Other constructors like list are similarly predefined in the initial basis, providing essential polymorphic structures without requiring user definitions.[8]
Advanced Features
Algebraic Data Types
Algebraic data types in Standard ML allow programmers to define custom sum and product types through datatype declarations, enabling the construction of complex, type-safe data structures such as trees or lists.[8] A datatype declaration specifies a new type constructor along with one or more value constructors that define how values of the type are formed, combining variants (sums) via alternatives separated by bars and products via tuples or records.[8] For instance, a binary tree can be declared asdatatype 'a [tree](/page/Tree) = [Leaf](/page/Leaf) of 'a | Node of 'a [tree](/page/Tree) * 'a [tree](/page/Tree), where Leaf wraps a single value and Node combines two subtrees.[20]
Each constructor in a datatype declaration generates a corresponding function that creates values of the type, with types inferred from the declaration; for the tree example, Leaf has type 'a -> 'a [tree](/page/Tree) and Node has type 'a [tree](/page/Tree) * 'a [tree](/page/Tree) -> 'a [tree](/page/Tree).[8] These constructors are exception-safe, as they cannot produce invalid states like null pointers—instead, all values must be explicitly built using the defined constructors, preventing runtime errors from uninitialized data.[8]
Standard ML supports recursive datatypes, where the type refers to itself within constructor definitions, facilitating structures like trees or the built-in list type (though custom lists can also be defined for illustration).[8] Mutual recursion is enabled by chaining multiple datatype declarations with the and keyword, allowing types to reference each other, such as datatype even = Zero | Succ of odd and datatype odd = One | Succ of even.[8] This mechanism ensures that recursion is properly handled at the type level without requiring special annotations.
The type system enforces safety by treating each datatype as a distinct new type, incompatible with others unless explicitly equated, which eliminates implicit conversions and promotes exhaustive construction via constructors.[8] Pattern matching, covered elsewhere, provides a natural way to deconstruct these types, but the declaration itself guarantees that all values are well-formed.[8]
Pattern Matching
Pattern matching in Standard ML provides a mechanism for destructuring composite values and binding variables to their components in a concise and type-safe manner, forming a cornerstone of the language's expressive power for handling structured data.[21] It appears primarily in match expressions and function definitions, where an input value is tested against a sequence of patterns, and the body associated with the first successful match is evaluated.[16] This approach enables declarative specification of behavior based on data shape, avoiding explicit type tests or conditionals.[22] The primary construct for pattern matching is the match expression, with syntaxcase exp of pat_1 => exp_1 | ... | pat_n => exp_n, where exp is evaluated to a value that is sequentially matched against the patterns pat_i.[16] The first pattern that matches binds any variables in it to corresponding subvalues from exp, after which the associated exp_i is evaluated in this extended environment.[21] If no pattern matches, the built-in Match exception is raised.[22] A wildcard pattern _ can be used to match any value without binding variables, often serving as a default case.[21] For example, the following computes the length of a list:
fun length l =
case l of
[] => 0
| _ :: rest => 1 + length rest
fun length l =
case l of
[] => 0
| _ :: rest => 1 + length rest
[] matches an empty list, while _ :: rest matches a non-empty list, binding rest to its tail.[22]
Standard ML supports a rich set of pattern forms to accommodate various data structures. Variables serve as simple patterns that bind to the entire matched value, such as x in case e of x => ....[16] Constructor patterns apply a data constructor to subpatterns, like Cons(h, t) for a cons cell, destructuring the value into head h and tail t.[21] Tuple patterns (pat_1, ..., pat_n) match n-tuples by binding each component to the corresponding subpattern, as in case p of (x, y) => x + y.[22] List patterns include literals like [pat_1, ..., pat_n] for fixed-length lists or cons forms pat_1 :: pat_2 for recursive decomposition.[16] As-patterns, written pat as vid, bind vid to the full matched value while simultaneously decomposing it via pat; for instance, p as (d, _) in a dictionary update function binds p to the pair and d to its first element.[22] This form, also known as a layered pattern, allows access to both the whole and its parts without recomputation.[21]
To ensure robustness, Standard ML compilers perform static checks on match expressions and function definitions involving patterns. Exhaustiveness checking verifies that the patterns cover all possible values of the scrutinized expression's type, issuing a warning if incomplete; for algebraic data types, this involves analyzing constructor coverage.[16] If the match is inexhaustive at runtime, the Match exception is raised non-exhaustively.[22] Redundancy checking detects and warns about superfluous rules that cannot be reached due to prior patterns, such as a clause following a wildcard.[21] These analyses promote reliable code by catching common errors early, though they are advisory rather than errors, allowing compilation to proceed.[16]
Higher-Order Functions and Modules
Standard ML supports higher-order functions, which treat functions as first-class citizens that can be passed as arguments to other functions, returned as results, or stored in data structures. This capability enables powerful abstractions for composing computations and processing collections like lists.[19] A canonical example is themap function from the Basis Library's List structure, which applies a given function to each element of a list, producing a new list of transformed elements. Its type is ('a -> 'b) -> 'a list -> 'b list, allowing polymorphic transformation without specifying the exact function in advance. For instance:
- List.map (fn x => x * 2) [1, 2, 3];
> val it = [2, 4, 6] : int list
- List.map (fn x => x * 2) [1, 2, 3];
> val it = [2, 4, 6] : int list
foldl and foldr further exemplify higher-order functions by accumulating a value over a list using a binary operator. The foldl function performs a left-to-right fold with type ('a * 'b -> 'b) -> 'b -> 'a [list](/page/List) -> 'b, starting from an initial accumulator and applying the operator to each element. Similarly, foldr folds right-to-left with the same type signature, enabling efficient computations such as summing a list:
- List.foldl (op +) 0 [1, 2, 3];
> val it = 6 : int
- List.foldl (op +) 0 [1, 2, 3];
> val it = 6 : int
structure Queue =
struct
type 'a queue = int * 'a list * 'a list
val empty = (0, [], [])
(* additional bindings *)
end
structure Queue =
struct
type 'a queue = int * 'a list * 'a list
val empty = (0, [], [])
(* additional bindings *)
end
signature QUEUE =
sig
type 'a queue
val empty : 'a queue
(* additional specifications *)
end
signature QUEUE =
sig
type 'a queue
val empty : 'a queue
(* additional specifications *)
end
structure Q :> QUEUE = Queue.[19]
Functors extend modularity by parameterizing structures over other structures matching a given signature, enabling generic programming. The syntax is functor F (S : SIG) = struct ... end, where SIG is the parameter signature and the body uses S. Functor application instantiates the parameter, generating a new structure: F (structure Arg = SomeStruct). This produces a result structure whose bindings are elaborated based on the functor body, supporting reusable abstractions like generic dictionaries parameterized over ordered keys.[19]
To enforce abstraction, Standard ML uses opaque (sealed) signatures, which hide implementation details by preventing access to types beyond the interface. Opaque ascription, via :> SIG, matches a structure to a signature but renames non-specified types to abstract ones, breaking external equalities. Transparent ascription (: SIG) preserves more information. Sharing constraints in signatures or functor applications enforce equality of types or structures across modules, such as sharing type T1 = T2 to ensure consistent type identities.[19]
The open declaration imports all bindings from a structure into the current namespace, facilitating convenient access without qualification, as in open Queue to use empty directly. This aids readability while maintaining modularity.[19]
Imperative Programming
Mutable State with References
Standard ML provides limited support for imperative programming through mutable references, which serve as the primary mechanism for introducing state into an otherwise purely functional language. A reference is created using theref constructor applied to an expression e, resulting in a mutable cell initialized with the value of e; the type of this reference is 'a ref where e has type 'a. Dereferencing occurs with the prefix operator ! applied to a reference r, yielding the current value stored in the cell, of type 'a if r has type 'a ref. Assignment to a reference uses the infix operator :=, as in r := e, which updates the cell with the value of e (requiring e to have type 'a) and returns unit. These operations enable side effects while integrating seamlessly with the type system, ensuring that mutations are type-safe.[8]
References in Standard ML exhibit polymorphism in their content type, allowing a single reference type 'a ref to accommodate any instantiation of 'a. However, the value restriction imposes limitations to maintain type safety, preventing polymorphic generalization of references created in expansive expressions (those involving function applications or other computations). For instance, a declaration like val r = ref 42 yields a monomorphic type int ref, as the expression ref 42 is expansive and does not generalize to forall 'a. 'a ref. In contrast, lambda abstractions such as fn x => ref x can achieve polymorphic types like forall 'a. 'a -> 'a ref because they are non-expansive values. This restriction avoids unsoundness, such as mutating a polymorphic reference with incompatible types, and ensures that references allocated on the heap only escape with concrete types upon function return.[8][24]
Imperative control structures in Standard ML are derived from functional primitives combined with references. Sequencing of expressions uses the infix operator ;, where e1; e2 evaluates e1 for its side effects (discarding its value if not unit) and then returns the value of e2, with the overall type matching that of e2. While loops, absent as a primitive in the core language, are implemented using recursive functions and references to maintain mutable state across iterations. For example, a counter-based loop might use a reference to track progress:
let
val count = ref 0
fun loop () =
if !count < 10 then
(count := !count + 1; loop ())
else ()
in
loop ()
end
let
val count = ref 0
fun loop () =
if !count < 10 then
(count := !count + 1; loop ())
else ()
in
loop ()
end
loop function tests the condition on the dereferenced reference, updates it via assignment, and recurses, leveraging tail-call optimization for efficiency. Such constructs simulate imperative loops while relying on the language's recursive binding form rec (or fun for top-level recursion) to define the looping function.[8][21]
This design positions references as a minimal extension to the pure functional core of Standard ML, confining mutations to explicit imperative constructs without polluting the declarative semantics. The type system tracks reference mutability precisely, preventing unintended aliasing or type inconsistencies, and ensures that pure expressions remain unaffected by distant side effects. As a result, most Standard ML programs can leverage functional purity for reasoning and compositionality, resorting to references only where imperative state is essential, such as in performance-critical sections or interfaces to external systems.[8][21]
Exceptions and Error Handling
Standard ML provides an exception mechanism for handling runtime errors and enabling non-local control transfers, allowing programs to signal and recover from exceptional conditions without relying on return codes or global state. Exceptions are values of the built-in abstract datatypeexn, which is extensible and cannot be constructed directly outside of exception declarations, ensuring type safety by preventing accidental confusion with ordinary values. This design supports polymorphic payloads, where an exception can carry data of arbitrary type, while maintaining a uniform handling interface through pattern matching.[25][16]
Exceptions are declared using the exception keyword, introducing a new variant constructor for the exn type. A simple exception without payload is declared as exception E, yielding a value of type exn. For exceptions carrying data, the syntax exception E of τ specifies a payload of type τ, resulting in an exception value of type τ -> exn; for example, exception Fail of string allows raising a string message. Multiple exceptions can be declared in a single statement using and, and exceptions may be aliased via exception E = E'. These declarations extend the global exception namespace and are visible throughout the program.[25][16]
To raise an exception, the raise expression is used: raise exp, where exp evaluates to an exn value, immediately terminating the current evaluation and propagating the exception upward through the call stack. For instance, raise Fail "error occurred" signals a failure with a descriptive message. Propagation continues until a handler matches the exception or the program terminates, providing a clean way to interrupt normal flow for error conditions.[25][16]
Handling occurs via the handle construct in expressions: exp handle match, where match consists of pattern-matching rules of the form pat => exp1 | .... The expression exp is evaluated first; if it succeeds, the handler is ignored. If it raises an exception e, the patterns are matched against e—using the same pattern matching as in case expressions—and the first successful match evaluates to the corresponding exp_i. Unmatched exceptions propagate further up the stack. This mechanism allows selective recovery, such as logging an error and continuing with a default value.[25][16]
Standard ML includes several built-in exceptions in the initial basis, raised automatically by language constructs or library functions for common errors:
Bind: Raised on pattern-binding failures, such as in non-exhaustivevalbindings.Match: Raised on non-exhaustive pattern matches incaseexpressions or function clauses.Overflow: Raised for arithmetic operations exceeding representable bounds.Div: Raised for division by zero.Interrupt: Raised on user interrupts, like Ctrl+C in interactive environments.
exn (or polymorphic variants where applicable) and can be handled like user-defined ones, with functions like exnName and exnMessage providing string representations for debugging. The polymorphic nature ensures flexibility—for example, Fail carries a string—while the distinct exn type prevents exceptions from being used as regular data, enforcing safe error handling.[25][16]
The following example demonstrates declaration, raising, and handling:
exception DivideByZero;
fun safeDiv (x, y) =
if y = 0 then raise DivideByZero
else x div y;
val result = safeDiv (10, 0) handle DivideByZero => 0;
exception DivideByZero;
fun safeDiv (x, y) =
if y = 0 then raise DivideByZero
else x div y;
val result = safeDiv (10, 0) handle DivideByZero => 0;
safeDiv raises DivideByZero on zero division, but the handler catches it and returns 0, illustrating recovery without stack unwinding beyond the try scope (noting that handle is the SML equivalent of a try-catch).[25]
Standard Library
Basis Library Overview
The Standard ML Basis Library is a standardized collection of modules that forms the core initial environment for SML programs, ensuring portability and consistency across implementations. Defined as part of the 1997 revision of the SML standard (SML'97), it specifies a minimal set of required modules and signatures to promote interoperable code that behaves predictably regardless of the underlying system.[26][27] This library was developed to address the limitations of earlier SML versions by providing a rich, general-purpose foundation for applications, with an emphasis on functional programming idioms and type-safe operations.[26] The library is organized hierarchically into structures and signatures, where structures implement the functionality and signatures define the interfaces to guarantee consistent behavior. Conformance to the standard requires implementations to provide all mandatory modules, such as those for basic data types and operations, while optional modules extend capabilities for specific environments like POSIX.[27] Examples of core modules includeInt for integer handling, List for list manipulations, and String for string operations, each adhering to predefined signatures like INTEGER, LIST, and STRING to enforce uniformity.[26] This structure supports the portability goal by abstracting platform-specific details, allowing developers to write code that compiles and runs identically on compliant systems.[27]
Key modules cover essential areas, including general-purpose facilities like input/output (I/O) and time management. For I/O, structures such as TextIO and BinIO provide text and binary stream operations, enabling reading from and writing to files or standard streams in a type-safe manner.[27] The Time structure handles time intervals and timestamps, supporting operations like addition and comparison of time values for timing applications.[26] In integer arithmetic, the Int module (implementing the INTEGER signature) offers operations such as addition (+), multiplication (*), and division (div), with support for both signed and unsigned variants through related modules like LargeInt.[27] For lists, the List structure includes higher-order functions like append (concatenating lists) and rev (reversing a list), facilitating efficient functional compositions without mutable state.[26]
Usage of the Basis Library typically involves implicit access to pervasive elements in the top-level environment, such as the int type or the length function, while specific modules are imported explicitly for clarity and namespace management. Developers can use open List; to bring all identifiers from the List structure into scope, or qualify names like List.map to avoid conflicts in larger programs.[27] This approach aligns with SML's module system, allowing seamless integration of Basis components into user-defined modules for building portable applications.[26]
Specialized Utility Modules
The Standard ML Basis Library includes several specialized modules that extend the core functionality for input/output operations, mathematical computations, date and time handling, operating system interactions, character and string manipulations, and low-level data structures. These modules provide domain-specific utilities that enable programmers to perform tasks such as file handling, trigonometric calculations, and bitwise operations while maintaining the language's type safety and functional paradigm.[27]I/O Modules
The TextIO structure facilitates text-based input and output using streams of characters and strings. It defines types such asinstream for input streams and outstream for output streams, supporting operations like openIn to open a file for reading, inputLine to read a newline-terminated line (returning SOME([string](/page/String)) or NONE at end-of-stream), and output to write a string to an output stream. Additional functions include openOut for creating or truncating files, openAppend for appending to files, and print for outputting to the standard output with automatic flushing; streams are block-buffered by default for files and line-buffered for interactive devices.[28]
Complementing TextIO, the BinIO structure handles binary input and output of 8-bit bytes (Word8.word elements). It provides similar stream types (instream and outstream) and functions such as openIn, openOut, and openAppend, inheriting imperative I/O semantics for reading (input, inputN) and writing (output, outputSubVec) binary data. This module is essential for low-level file operations where byte-level precision is required, with error handling via the Io exception for failures like non-existent files.[29]
Math and Utility Modules
The Math structure offers fundamental mathematical constants and functions operating on thereal type, adhering to IEEE 754-1985 floating-point semantics. Key constants include pi (approximately 3.14159) and e (approximately 2.71828), while functions encompass sin, cos, and tan for trigonometric operations in radians, sqrt for square roots (returning NaN for negative inputs), exp for exponentiation, ln and log10 for logarithms (NaN for non-positive arguments), and pow for general exponentiation, returning NaN for negative bases with non-integer exponents and handling other edge cases per IEEE 754 semantics. These operations support scientific computing and numerical algorithms within SML programs.[30]
For calendar and time management, the Date structure provides tools for representing and manipulating dates in a specific time zone. It defines an abstract date type constructed via the date function from components like year, month, day, hour, minute, second, and optional offset; conversion functions include fromTimeLocal and fromTimeUniv to derive dates from UTC times, and toTime for the reverse. Formatting is handled by fmt using specifiers (e.g., %Y for four-digit year) and toString for a standard 24-character representation, while scanning parses dates via scan from streams or fromString from strings, returning options for error cases. This module integrates with the Time structure for comprehensive datetime operations.[31]
The OS structure serves as a facade for operating system interactions, encapsulating substructures like OS.FileSys for file and directory management, OS.IO for polling, OS.Path for pathname syntax, and OS.Process for process control and environment access. It declares the SysErr exception for runtime errors and provides utilities such as errorMsg to retrieve descriptive strings for system errors, errorName for unique identifiers, and syserror for converting error strings to optional codes. These facilities enable portable syscalls, such as file attribute queries in OS.FileSys or process termination in OS.Process, abstracting platform-specific details.[32]
Char and String Modules
The Char structure supplies predicates and conversions for individual characters, treating them as 8-bit values with operations likeord and chr for integer mappings (raising Chr for out-of-range inputs), toLower and toUpper for case conversion, and toString for escaped string representations. Predicates include isDigit to check for decimal digits ('0' to '9'), isAlpha for letters, isAlphaNum for alphanumeric characters, isSpace for whitespace, and isCntrl for control characters, facilitating text validation and processing.[33]
Building on Char, the String structure manages immutable sequences of characters with functions for conversion and extraction. Conversions include str to create a single-character string and implode to concatenate a character list into a string, while explode reverses this by producing a char list. Substring operations feature sub to access the i-th character (zero-based), substring to extract a contiguous segment of length j starting at i, and extract for substrings from i to the end or a specified length; these support efficient string manipulation without mutation.[34]
Extensions in Basis
The CharArray structure extends array facilities for mutable sequences of characters, providing low-level data handling akin to the general Array but specialized for chars. It includesarray to create an array of given length initialized with a char, fromList for list-to-array conversion, sub for indexed access, update for modification, and vector to convert to an immutable vector, useful for performance-critical text buffering.[35]
Similarly, the Word structure supports unsigned machine-word integers with fixed precision (wordSize bits), enabling bitwise and arithmetic operations for low-level data. Key features include modular arithmetic via +, -, *, div, and mod; bitwise functions like andb, orb, xorb, and notb; shifts with << (left), >> (logical right), and ~>> (arithmetic right); and conversions such as toInt/fromInt and toLarge/fromLarge for interfacing with other numeric types, ideal for bit manipulation and systems programming.[36]
Implementations
SML/NJ
Standard ML of New Jersey (SML/NJ) is a prominent implementation of the Standard ML '97 programming language, originally developed in spring 1986 by David MacQueen at Bell Laboratories and Andrew Appel at Princeton University.[3] It emerged as a dedicated compiler aimed at producing highly optimized native code, evolving from earlier ML systems and serving as the primary reference implementation due to its comprehensive support for the language definition and ongoing maintenance by a core group including MacQueen and John Reppy.[3][37] Key features of SML/NJ include its interactive read-eval-print loop (REPL), accessible via thesml command, which enables incremental development, dynamic loading of modules, and immediate execution of expressions in an environment with full type inference and garbage collection.[38] The system supports compilation to native machine code through the Compilation Manager (CM), which uses dependency analysis to minimize recompilation and leverages the MLRISC backend for targets like AMD64.[39] Beyond strict conformance to Standard ML '97, SML/NJ offers extensions such as first-class continuations via the Cont structure and suspend/resume mechanisms provided by the Susp structure, facilitating advanced control abstractions like coroutines.[40]
For performance, SML/NJ employs a two-level bootstrap compiler process, where an initial lightweight compiler builds the full optimizing compiler, enabling efficient self-hosting and generation of high-quality code across platforms.[41] The latest release, version 110.99.9 from November 4, 2025, focuses on bug fixes and stability improvements, including enhancements to 64-bit support on macOS and Linux.[42]
SML/NJ is distributed as free, open-source software under the BSD 3-Clause License and is cross-platform, with official support for Unix-like systems (including Linux), Windows, and macOS, including recent additions for Arm64 architectures.[43][44] Source distributions and installers are available via GitHub and the official website, ensuring accessibility for developers and researchers.[45][14]
MLton
MLton is a whole-program optimizing compiler for the Standard ML programming language, designed to generate small, standalone executables with high runtime performance and no dependency on an interactive runtime system.[46] Unlike interactive environments, MLton performs a complete analysis and optimization of the entire program before code generation, enabling aggressive optimizations that eliminate overhead from higher-order features and produce native machine code.[46] This approach results in executables that are self-contained, leveraging techniques such as unboxed native representations for integers, reals, and arrays, as well as fast arbitrary-precision arithmetic via GMP integration.[46] The compiler's optimization pipeline includes several advanced transformations to improve efficiency. Key steps involve defunctionalization and closure conversion, which translate higher-order functional code into a first-order static single assignment (SSA) form, eliminating closures and enabling subsequent optimizations like constant propagation and dead code elimination.[47] Additional optimizations encompass multiple code generation backends and garbage collection strategies, allowing selection based on target architecture for minimal memory and execution overhead.[46] MLton also supports built-in profiling capabilities, including allocation, time, and stack profiling via the mlprof tool, which instruments code to measure function-level metrics without significant performance degradation in non-profiled builds.[48] MLton adheres strictly to the SML'97 standard, ensuring full conformance to the revised Definition of Standard ML while providing extensions such as a C foreign function interface (FFI) for integrating with system libraries and utilities like lexer and parser generators (mllex and mlyacc).[46] The project originated in 1997 as an independent SML compiler effort, with the first release under the name MLton in 1999, and has been open-source throughout its development.[49] As of 2025, MLton remains actively maintained, with the latest release (20241230) in December 2024 and ongoing updates to support modern platforms and enhancements for systems programming applications.[50][46]Poly/ML
Poly/ML is a high-performance implementation of Standard ML, originally developed by David Matthews at the University of Cambridge's Computer Laboratory. It derives its name from an experimental language called Poly, in which the initial version was written, and builds on earlier efforts like Cambridge ML to provide efficient compilation and runtime support for large-scale programming.[51][52] Designed with a focus on high-performance computing, Poly/ML emphasizes scalability for demanding applications through advanced runtime features.[53] A core strength of Poly/ML lies in its support for parallelism, achieved via native operating system threads that leverage POSIX primitives such as forks, mutexes, and condition variables. This enables full multiprocessor utilization, with the thread library and garbage collector both optimized for concurrent execution on shared-memory systems. While distributed computing is not natively built-in, the implementation facilitates high-level parallelism abstractions, such as future values in tools like Isabelle/ML, for implicit scheduling and parallel evaluation. Additionally, its garbage collector is engineered for efficiency with large heaps, accommodating memory-intensive workloads up to the terabyte scale through 64-bit addressing and dynamic heap adjustment.[54][55][53] Poly/ML achieves full conformance to the Standard ML '97 specification, including the complete Basis Library, ensuring portability of Standard ML code across implementations. Beyond the standard, it introduces extensions primarily within thePolyML structure, such as the compiler for foreign calls that generate executable code from SML sources and export to object files, and the Profiling submodule for measuring execution time and memory allocation. Other enhancements include explicit garbage collection triggers via fullGC, heap analysis functions like objSize, and thread-safe compilation to support parallel code generation. These features extend the language without compromising core compatibility, enabling integration with external systems and performance tuning.[55][56]
As of 2025, Poly/ML remains actively maintained, with version 5.9.2 released in August 2025 and development continuing on GitHub through community contributions. It is particularly valued in scientific computing for its robustness in handling complex, resource-heavy tasks, serving as the backend for formal verification systems like Isabelle and HOL4, where parallel proof checking and large-scale theorem proving demand efficient memory and concurrency management.[53][57]
Other Notable Implementations
Moscow ML is a lightweight implementation of Standard ML, designed primarily for educational and research purposes, featuring an interactive interpreter and a bytecode compiler that generates standalone executables. It fully conforms to the 1997 revision of Standard ML, including the modules language, and implements most of the Standard ML Basis Library, though it omits functional stream I/O.[58] Originally developed by Peter Sestoft at the Technical University of Denmark, it is portable across Unix-like systems and Windows, with a small footprint that facilitates quick startup times in interactive sessions, making it suitable for teaching functional programming concepts.[59] Moscow ML includes extensions such as higher-order functors and first-class modules, which enhance its expressiveness beyond the strict standard while maintaining compatibility.[60] MLKit is a research-oriented compiler toolkit for Standard ML, originating from the University of Aarhus and later developed at the IT University of Copenhagen, emphasizing region-based memory management as an alternative to traditional garbage collection. This approach uses region inference to allocate objects in hierarchical regions that are deallocated explicitly or automatically at region boundaries, reducing runtime overhead and enabling predictable memory usage in performance-critical applications.[61] MLKit supports the full SML'97 language definition and a large portion of the Basis Library, compiling to native code for x86-64 architectures via a bytecode intermediate representation.[62] Its design prioritizes static analysis for region lifetimes, allowing programmers to influence memory behavior through annotations while preserving the language's safety guarantees.[63] CakeML provides a verified implementation of a significant subset of Standard ML, tailored for formal methods and certified software development, with its entire compiler pipeline mechanically proven correct in the HOL4 theorem prover. This subset encompasses core features like datatypes, pattern matching, higher-order functions, references, exceptions, polymorphism, modules, signatures, and arbitrary-precision integers, but excludes records, I/O operations, and functors to simplify verification.[64] Developed collaboratively by researchers at institutions including the University of Cambridge and the University of Texas at Austin, CakeML generates certified x86-64 machine code, enabling end-to-end guarantees from source code to execution, including a bootstrapping compiler and interactive REPL.[65] Its focus on formal verification supports applications in trustworthy systems, such as certified operating system components or theorem provers.[66] These implementations exhibit trade-offs in performance, features, and conformance to the 1997 Standard ML definition. Moscow ML prioritizes lightweight interactivity and educational accessibility, offering full language conformance with rapid startup but less aggressive optimization compared to whole-program compilers, achieving reasonable execution speeds for interpretive use.[58] MLKit balances research innovation in memory management—regions can yield lower latency than garbage collection in structured programs—with complete SML'97 support, though it requires familiarity with region annotations for optimal efficiency.[63] CakeML, while limited to a verified subset (approximately 80% of core SML features), ensures highest assurance levels at the cost of broader feature omission and potential performance overhead from its proof obligations, making it ideal for safety-critical domains rather than general-purpose coding.[64] Overall, all three maintain high fidelity to the 1997 standard where applicable, with varying Basis Library coverage: Moscow ML and MLKit near-complete, and CakeML partial due to its scoped design.[62]Code Examples
Simple Programs
Standard ML's simple programs highlight its functional nature, concise syntax, and reliance on the Basis Library for common operations like input/output and list processing. These examples illustrate basic expression evaluation, function definitions via pattern matching, and the use of higher-order functions, all without explicit type declarations thanks to the language's type inference mechanism.[8] A fundamental introductory program is the "Hello, World!" example, which demonstrates output using the TextIO module from the Basis Library. The following code prints a greeting to the standard output stream:TextIO.print("Hello, world!\n");
TextIO.print("Hello, world!\n");
val _ = TextIO.print("Hello, world!\n");
val _ = TextIO.print("Hello, world!\n");
string -> unit, applying the given string directly to the output stream.
Another basic example is a recursive function for computing the factorial of a non-negative integer, defined using pattern matching on the argument as per the core language syntax. The code is:
fun fact 0 = 1
| fact n = n * fact (n - 1);
fun fact 0 = 1
| fact n = n * fact (n - 1);
fact with a base case where fact 0 returns 1, and a recursive case multiplying n by fact (n - 1) for n > 0. The type is inferred as int -> int. For instance, evaluating fact 5 yields 120, demonstrating the language's support for straightforward recursion without explicit loops.[8]
Simple list operations showcase Standard ML's built-in support for functional transformations via the List structure in the Basis Library. Consider mapping a function to double each element in an integer list:
val doubled = List.map (fn x => x * 2) [1, 2, 3, 4];
val doubled = List.map (fn x => x * 2) [1, 2, 3, 4];
List.map function, with signature ('a -> 'b) -> 'a list -> 'b list, applies the anonymous function (fn x => x * 2) (of type int -> int) to each element from left to right, producing [2, 4, 6, 8] of type int list.[23]
Similarly, filtering a list to retain only even numbers uses List.filter:
val evens = List.filter (fn x => x mod 2 = 0) [1, 2, 3, 4, 5, 6];
val evens = List.filter (fn x => x mod 2 = 0) [1, 2, 3, 4, 5, 6];
('a -> bool) -> 'a list -> 'a list, this applies the predicate (fn x => x mod 2 = 0) (type int -> bool) and returns [2, 4, 6] of type int list, preserving the original order of matching elements. These operations exemplify immutable list handling central to the language.[23]
Algorithm Implementations
Standard ML's functional paradigm lends itself naturally to expressing classic sorting algorithms using recursion, pattern matching, and higher-order functions on lists. These implementations highlight the language's emphasis on immutability and declarative style, avoiding mutable state or loops. Below, representative examples of insertion sort, merge sort, and quicksort are provided, demonstrating core idioms such as list cons (::) and pattern matching. Efficiency considerations, including tail-recursive optimizations via accumulators, are also discussed to align with the language's support for safe, performant recursion.Insertion Sort
Insertion sort builds a sorted list incrementally by inserting elements into their correct positions. This algorithm showcases pattern matching on lists, where the empty list serves as the base case and the cons operator allows recursive insertion. Theinsert function places a value x into a sorted list ys, comparing x with the head y of ys:
fun insert x [] = [x]
| insert x (y :: ys) = if x <= y then x :: y :: ys
else y :: insert x ys
fun insert x [] = [x]
| insert x (y :: ys) = if x <= y then x :: y :: ys
else y :: insert x ys
isort function then iterates over the input list, inserting each element into an accumulating sorted result:
fun isort [] = []
| isort (x :: xs) = insert x (isort xs)
fun isort [] = []
| isort (x :: xs) = insert x (isort xs)
insert call. To optimize, a tail-recursive variant uses an accumulator to build the result:
fun isort_tail xs =
let fun insort acc [] = acc
| insort acc (x :: xs) = insort (insert x acc) xs
in insort [] xs end
fun isort_tail xs =
let fun insort acc [] = acc
| insort acc (x :: xs) = insort (insert x acc) xs
in insort [] xs end
insert. With proper tail-call optimization in compliant implementations, it avoids stack overflow while achieving O(n²) time.
Merge Sort
Merge sort employs a divide-and-conquer strategy, splitting lists recursively and merging sorted halves. It is stable and guarantees O(n log n) time, making it suitable for demonstrating recursive decomposition in SML. First, thesplit function divides a list into two roughly equal parts using even-odd indexing:
fun split [] = ([], [])
| split [x] = ([x], [])
| split (x :: y :: xs) = let val (l, r) = split xs
in (x :: l, y :: r) end
fun split [] = ([], [])
| split [x] = ([x], [])
| split (x :: y :: xs) = let val (l, r) = split xs
in (x :: l, y :: r) end
merge function combines two sorted lists, using pattern matching to compare heads and recurse on tails:
fun merge [] ys = ys
| merge xs [] = xs
| merge (x :: xs) (y :: ys) = if x <= y then x :: merge xs (y :: ys)
else y :: merge (x :: xs) ys
fun merge [] ys = ys
| merge xs [] = xs
| merge (x :: xs) (y :: ys) = if x <= y then x :: merge xs (y :: ys)
else y :: merge (x :: xs) ys
msort ties these together:
fun msort [] = []
| msort [x] = [x]
| msort xs = let val (l, r) = split xs
in merge (msort l) (msort r) end
fun msort [] = []
| msort [x] = [x]
| msort xs = let val (l, r) = split xs
in merge (msort l) (msort r) end
Quicksort
Quicksort selects a pivot and partitions the list into elements less than or equal to and greater than the pivot, recursing on each part. In SML, partitioning uses higher-order functions likeList.filter, naturally handling duplicates by including equals in one subtree for simplicity.
Assuming the List structure's filter:
fun qsort [] = []
| qsort [x] = [x]
| qsort xs =
let val pivot = hd xs
val rest = tl xs
val lows = List.filter (fn y => y <= pivot) rest
val highs = List.filter (fn y => y > pivot) rest
in qsort lows @ pivot :: qsort highs end
fun qsort [] = []
| qsort [x] = [x]
| qsort xs =
let val pivot = hd xs
val rest = tl xs
val lows = List.filter (fn y => y <= pivot) rest
val highs = List.filter (fn y => y > pivot) rest
in qsort lows @ pivot :: qsort highs end
@ operator concatenates results, with hd and tl extracting pivot and tail. This version averages O(n log n) time but worst-cases to O(n²) on sorted inputs; randomized pivot selection (e.g., via List.nth (xs, Random.randInt (length xs))) can mitigate this, though not shown here for brevity. These implementations illustrate how SML's list operations foster concise, correct algorithms without side effects.
Advanced Language Features
Standard ML provides advanced mechanisms for abstraction and error management, including modules and functors for structuring code and exceptions for handling runtime errors. These features enable the creation of reusable, generic components and robust programs that gracefully manage exceptional conditions. The language's module system, defined in the revised standard, supports parametric polymorphism through functors, allowing structures to be instantiated with different types while preserving type safety.[8] Exceptions, also part of the core definition, permit the interruption of normal control flow to signal and recover from errors like arithmetic overflows or invalid operations.[8]Error Handling with Exceptions
Standard ML's exception system uses theraise keyword to signal errors and the handle clause to catch them, providing a form of lightweight error handling without altering the functional purity of expressions. For instance, the built-in division operator div raises the Div exception when dividing by zero, as specified in the Basis Library's Integer structure. To demonstrate recovery, consider a safe division function that catches this exception and returns a default value:
fun safeDiv (x, y) =
x div y handle Div => 0;
fun safeDiv (x, y) =
x div y handle Div => 0;
safeDiv (10, 0) yields 0 instead of terminating the program, illustrating how [handle](/page/Handle) matches the raised exception and substitutes an alternative computation. This approach is exhaustive in pattern matching on exceptions, ensuring all cases are addressed if the handler covers the relevant constructors.[8]
Expression Interpreter Using Datatypes and Pattern Matching
Datatypes in Standard ML allow the definition of algebraic data types for representing structured data, such as abstract syntax trees for expressions, combined with pattern matching in functions for concise evaluation. A common practical scenario is an interpreter for simple arithmetic expressions, where a datatype encodes constants, addition, subtraction, and division. Theeval function uses pattern matching to recursively compute the value, raising exceptions for invalid operations like division by zero via the underlying div primitive.[8]
Here is an example datatype and evaluator:
datatype exp = Constant of int
| Plus of exp * exp
| Minus of exp * exp
| Divide of exp * exp;
fun eval (Constant n) = n
| eval (Plus (e1, e2)) = eval e1 + eval e2
| eval (Minus (e1, e2)) = eval e1 - eval e2
| eval (Divide (e1, e2)) = eval e1 div eval e2;
datatype exp = Constant of int
| Plus of exp * exp
| Minus of exp * exp
| Divide of exp * exp;
fun eval (Constant n) = n
| eval (Plus (e1, e2)) = eval e1 + eval e2
| eval (Minus (e1, e2)) = eval e1 - eval e2
| eval (Divide (e1, e2)) = eval e1 div eval e2;
Divide (Constant 10, Constant 0), [eval](/page/Eval) raises Div, which can be caught externally with [handle](/page/Handle) for error propagation or recovery, such as raising a custom [EvalError](/page/Error) exception:
exception EvalError of string;
val result = eval (Divide (Constant 10, Constant 0))
handle Div => raise EvalError "Division by zero";
exception EvalError of string;
val result = eval (Divide (Constant 10, Constant 0))
handle Div => raise EvalError "Division by zero";
Arbitrary-Precision Integers with the IntInf Module
The Basis Library includes theIntInf structure for arbitrary-precision integers, supporting operations beyond the fixed-size int type, such as addition, multiplication, and conversion from smaller types. This module is essential for computations involving large numbers where overflow is a concern, as IntInf values can grow dynamically without bounds (limited only by memory).[67]
Key operations include fromInt for conversion and the overloaded arithmetic operators like +. For example, to compute the factorial of a large number:
local
open IntInf
in
fun fact 0 = 1
| fact n = n * fact (n - 1)
end;
val largeFact = fact (fromInt 1000); (* Results in a very large IntInf integer *)
local
open IntInf
in
fun fact 0 = 1
| fact n = n * fact (n - 1)
end;
val largeFact = fact (fromInt 1000); (* Results in a very large IntInf integer *)
IntInf overloads numeric literals to IntInf.int type and uses the module's arithmetic operators like *, enabling concise code without explicit fromInt conversions inside the function or qualified names. The argument to fact requires explicit conversion from int to IntInf, and * performs arbitrary-precision multiplication, ensuring no overflow occurs even for results exceeding machine word size. This demonstrates IntInf's utility in numerical applications requiring exact arithmetic.[67]
Generic Structures with Functors and Higher-Order Functions
Functors in Standard ML enable the creation of generic modules by parameterizing structures over signatures, facilitating reusable abstractions like sets. A functor can define higher-order functions that operate on the provided types, with partial application allowing flexible instantiation. For example, a functor for a generic set uses a higher-order comparison function partially applied to elements.[8] Consider a signature for ordered elements:signature ORD = sig
type t
val compare : t * t -> order
end;
signature ORD = sig
type t
val compare : t * t -> order
end;
GenericSet takes an ORD structure and implements a set with insertion using a partially applied fold:
functor GenericSet (Key : ORD) : sig
type set
val empty : set
val insert : Key.t -> set -> set
end = struct
type set = Key.t list (* Simplified as list for illustration *)
val empty = []
fun insert x s = if List.exists (fn y => Key.compare (x, y) = EQUAL) s
then s
else x :: s (* Higher-order List.exists with partial compare *)
end;
functor GenericSet (Key : ORD) : sig
type set
val empty : set
val insert : Key.t -> set -> set
end = struct
type set = Key.t list (* Simplified as list for illustration *)
val empty = []
fun insert x s = if List.exists (fn y => Key.compare (x, y) = EQUAL) s
then s
else x :: s (* Higher-order List.exists with partial compare *)
end;
IntOrd structure providing compare):
structure IntOrd = struct
type t = int
val compare = Int.compare
end;
structure IntSet = GenericSet (IntOrd);
val s = IntSet.insert 42 IntSet.empty; (* Partial application in insert via fold-like ops *)
structure IntOrd = struct
type t = int
val compare = Int.compare
end;
structure IntSet = GenericSet (IntOrd);
val s = IntSet.insert 42 IntSet.empty; (* Partial application in insert via fold-like ops *)
List.exists (partially applied with Key.compare) enabling generic behavior. The Basis Library's List structure provides such combinators for list manipulation within functors.[8]
