Hubbry Logo
Standard MLStandard MLMain
Open search
Standard ML
Community hub
Standard ML
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Standard ML
Standard ML
from Wikipedia
Standard ML
ParadigmMulti-paradigm: functional, imperative, modular[1]
FamilyML
First appeared1983; 42 years ago (1983)[2]
Stable release
Standard ML '97[2] / 1997; 28 years ago (1997)
Typing disciplineInferred, static, strong
Filename extensions.sml
Websitesmlfamily.github.io
Major implementations
SML/NJ, MLton, Poly/ML
Dialects
Alice, Concurrent ML, Dependent ML
Influenced by
ML, Hope, Pascal
Influenced
Elm, F#, F*, Haskell, OCaml, Python,[3] Rust,[4] Scala

Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.

Standard ML is a modern dialect of ML, the language used in the Logic for Computable Functions (LCF) theorem-proving project. It is distinctive among widely used languages in that it has a formal specification, given as typing rules and operational semantics in The Definition of Standard ML.[5]

Language

[edit]

Standard ML is a functional programming language with some impure features. Programs written in Standard ML consist of expressions in contrast to statements or commands, although some expressions of type unit are only evaluated for their side-effects.

Functions

[edit]

Like all functional languages, a key feature of Standard ML is the function, which is used for abstraction. The factorial function can be expressed as follows:

fun factorial n = 
    if n = 0 then 1 else n * factorial (n - 1)

Type inference

[edit]

An SML compiler must infer the static type val factorial : int -> int without user-supplied type annotations. It has to deduce that n is only used with integer expressions, and must therefore itself be an integer, and that all terminal expressions are integer expressions.

Declarative definitions

[edit]

The same function can be expressed with clausal function definitions where the if-then-else conditional is replaced with templates of the factorial function evaluated for specific values:

fun factorial 0 = 1
  | factorial n = n * factorial (n - 1)

Imperative definitions

[edit]

or iteratively:

fun factorial n = let val i = ref n and acc = ref 1 in
    while !i > 0 do (acc := !acc * !i; i := !i - 1); !acc
end

Lambda functions

[edit]

or as a lambda function:

val rec factorial = fn 0 => 1 | n => n * factorial (n - 1)

Here, the keyword val introduces a binding of an identifier to a value, fn introduces an anonymous function, and rec allows the definition to be self-referential.

Local definitions

[edit]

The encapsulation of an invariant-preserving tail-recursive tight loop with one or more accumulator parameters within an invariant-free outer function, as seen here, is a common idiom in Standard ML.

Using a local function, it can be rewritten in a more efficient tail-recursive style:

local
    fun loop (0, acc) = acc
      | loop (m, acc) = loop (m - 1, m * acc)
in
    fun factorial n = loop (n, 1)
end

Type synonyms

[edit]

A type synonym is defined with the keyword type. Here is a type synonym for points on a plane, and functions computing the distances between two points, and the area of a triangle with the given corners as per Heron's formula. (These definitions will be used in subsequent examples).

type loc = real * real

fun square (x : real) = x * x

fun dist (x, y) (x', y') =
    Math.sqrt (square (x' - x) + square (y' - y))

fun heron (a, b, c) = let
    val x = dist a b
    val y = dist b c
    val z = dist a c
    val s = (x + y + z) / 2.0
    in
        Math.sqrt (s * (s - x) * (s - y) * (s - z))
    end

Algebraic datatypes

[edit]

Standard ML provides strong support for algebraic datatypes (ADT). A data type can be thought of as a disjoint union of tuples (or a "sum of products"). They are easy to define and easy to use, largely because of pattern matching, and most Standard ML implementations' pattern-exhaustiveness checking and pattern redundancy checking.

In object-oriented programming languages, a disjoint union can be expressed as class hierarchies. However, in contrast to class hierarchies, ADTs are closed. Thus, the extensibility of ADTs is orthogonal to the extensibility of class hierarchies. Class hierarchies can be extended with new subclasses which implement the same interface, while the functions of ADTs can be extended for the fixed set of constructors. See expression problem.

A datatype is defined with the keyword datatype, as in:

datatype shape
    = Circle   of loc * real      (* center and radius *)
    | Square   of loc * real      (* upper-left corner and side length; axis-aligned *)
    | Triangle of loc * loc * loc (* corners *)

Note that a type synonym cannot be recursive; datatypes are necessary to define recursive constructors. (This is not at issue in this example.)

Pattern matching

[edit]

Patterns are matched in the order in which they are defined. C programmers can use tagged unions, dispatching on tag values, to do what ML does with datatypes and pattern matching. Nevertheless, while a C program decorated with appropriate checks will, in a sense, be as robust as the corresponding ML program, those checks will of necessity be dynamic; ML's static checks provide strong guarantees about the correctness of the program at compile time.

Function arguments can be defined as patterns as follows:

fun area (Circle (_, r)) = Math.pi * square r
  | area (Square (_, s)) = square s
  | area (Triangle p) = heron p (* see above *)

The so-called "clausal form" of function definition, where arguments are defined as patterns, is merely syntactic sugar for a case expression:

fun area shape = case shape of
    Circle (_, r) => Math.pi * square r
  | Square (_, s) => square s
  | Triangle p => heron p

Exhaustiveness checking

[edit]

Pattern-exhaustiveness checking will make sure that each constructor of the datatype is matched by at least one pattern.

The following pattern is not exhaustive:

fun center (Circle (c, _)) = c
  | center (Square ((x, y), s)) = (x + s / 2.0, y + s / 2.0)

There is no pattern for the Triangle case in the center function. The compiler will issue a warning that the case expression is not exhaustive, and if a Triangle is passed to this function at runtime, exception Match will be raised.

Redundancy checking

[edit]

The pattern in the second clause of the following (meaningless) function is redundant:

fun f (Circle ((x, y), r)) = x + y
  | f (Circle _) = 1.0
  | f _ = 0.0

Any value that would match the pattern in the second clause would also match the pattern in the first clause, so the second clause is unreachable. Therefore, this definition as a whole exhibits redundancy, and causes a compile-time warning.

The following function definition is exhaustive and not redundant:

val hasCorners = fn (Circle _) => false | _ => true

If control gets past the first pattern (Circle), we know the shape must be either a Square or a Triangle. In either of those cases, we know the shape has corners, so we can return true without discerning the actual shape.

Higher-order functions

[edit]

Functions can consume functions as arguments:

fun map f (x, y) = (f x, f y)

Functions can produce functions as return values:

fun constant k = (fn _ => k)

Functions can also both consume and produce functions:

fun compose (f, g) = (fn x => f (g x))

The function List.map from the basis library is one of the most commonly used higher-order functions in Standard ML:

fun map _ [] = []
  | map f (x :: xs) = f x :: map f xs

A more efficient implementation with tail-recursive List.foldl:

fun map f = List.rev o List.foldl (fn (x, acc) => f x :: acc) []

Exceptions

[edit]

Exceptions are raised with the keyword raise and handled with the pattern matching handle construct. The exception system can implement non-local exit; this optimization technique is suitable for functions like the following.

local
    exception Zero;
    val p = fn (0, _) => raise Zero | (a, b) => a * b
in
    fun prod xs = List.foldl p 1 xs handle Zero => 0
end

When exception Zero is raised, control leaves the function List.foldl altogether. Consider the alternative: the value 0 would be returned, it would be multiplied by the next integer in the list, the resulting value (inevitably 0) would be returned, and so on. The raising of the exception allows control to skip over the entire chain of frames and avoid the associated computation. Note the use of the underscore (_) as a wildcard pattern.

The same optimization can be obtained with a tail call.

local
    fun p a (0 :: _) = 0
      | p a (x :: xs) = p (a * x) xs
      | p a [] = a
in
    val prod = p 1
end

Module system

[edit]

Standard ML's advanced module system allows programs to be decomposed into hierarchically organized structures of logically related type and value definitions. Modules provide not only namespace control but also abstraction, in the sense that they allow the definition of abstract data types. Three main syntactic constructs comprise the module system: signatures, structures and functors.

Signatures

[edit]

A signature is an interface, usually thought of as a type for a structure; it specifies the names of all entities provided by the structure, the arity of each type component, the type of each value component, and the signature of each substructure. The definitions of type components are optional; type components whose definitions are hidden are abstract types.

For example, the signature for a queue may be:

signature QUEUE = sig
    type 'a queue
    exception QueueError;
    val empty     : 'a queue
    val isEmpty   : 'a queue -> bool
    val singleton : 'a -> 'a queue
    val fromList  : 'a list -> 'a queue
    val insert    : 'a * 'a queue -> 'a queue
    val peek      : 'a queue -> 'a
    val remove    : 'a queue -> 'a * 'a queue
end

This signature describes a module that provides a polymorphic type 'a queue, exception QueueError, and values that define basic operations on queues.

Structures

[edit]

A structure is a module; it consists of a collection of types, exceptions, values and structures (called substructures) packaged together into a logical unit.

A queue structure can be implemented as follows:

structure TwoListQueue :> QUEUE = struct
    type 'a queue = 'a list * 'a list

    exception QueueError;

    val empty = ([], [])

    fun isEmpty ([], []) = true
      | isEmpty _ = false

    fun singleton a = ([], [a])

    fun fromList a = ([], a)

    fun insert (a, ([], [])) = singleton a
      | insert (a, (ins, outs)) = (a :: ins, outs)

    fun peek (_, []) = raise QueueError
      | peek (ins, outs) = List.hd outs

    fun remove (_, []) = raise QueueError
      | remove (ins, [a]) = (a, ([], List.rev ins))
      | remove (ins, a :: outs) = (a, (ins, outs))
end

This definition declares that structure TwoListQueue implements signature QUEUE. Furthermore, the opaque ascription denoted by :> states that any types which are not defined in the signature (i.e. type 'a queue) should be abstract, meaning that the definition of a queue as a pair of lists is not visible outside the module. The structure implements all of the definitions in the signature.

The types and values in a structure can be accessed with "dot notation":

val q : string TwoListQueue.queue = TwoListQueue.empty
val q' = TwoListQueue.insert (Real.toString Math.pi, q)

Functors

[edit]

A functor is a function from structures to structures; that is, a functor accepts one or more arguments, which are usually structures of a given signature, and produces a structure as its result. Functors are used to implement generic data structures and algorithms.

One popular algorithm[6] for breadth-first search of trees makes use of queues. Here is a version of that algorithm parameterized over an abstract queue structure:

(* after Okasaki, ICFP, 2000 *)
functor BFS (Q: QUEUE) = struct
  datatype 'a tree = E | T of 'a * 'a tree * 'a tree

  local
    fun bfsQ q = if Q.isEmpty q then [] else search (Q.remove q)
    and search (E, q) = bfsQ q
      | search (T (x, l, r), q) = x :: bfsQ (insert (insert q l) r)
    and insert q a = Q.insert (a, q)
  in
    fun bfs t = bfsQ (Q.singleton t)
  end
end

structure QueueBFS = BFS (TwoListQueue)

Within functor BFS, the representation of the queue is not visible. More concretely, there is no way to select the first list in the two-list queue, if that is indeed the representation being used. This data abstraction mechanism makes the breadth-first search truly agnostic to the queue's implementation. This is in general desirable; in this case, the queue structure can safely maintain any logical invariants on which its correctness depends behind the bulletproof wall of abstraction.

Code examples

[edit]

Snippets of SML code are most easily studied by entering them into an interactive top-level.

Hello, world!

[edit]

The following is a "Hello, World!" program:

hello.sml
print "Hello, world!\n";
sh
$ mlton hello.sml
$ ./hello
Hello, world!

Algorithms

[edit]

Insertion sort

[edit]

Insertion sort for int list (ascending) can be expressed concisely as follows:

fun insert (x, []) = [x] | insert (x, h :: t) = sort x (h, t)
and sort x (h, t) = if x < h then [x, h] @ t else h :: insert (x, t)
val insertionsort = List.foldl insert []

Mergesort

[edit]

Here, the classic mergesort algorithm is implemented in three functions: split, merge and mergesort. Also note the absence of types, with the exception of the syntax op :: and [] which signify lists. This code will sort lists of any type, so long as a consistent ordering function cmp is defined. Using Hindley–Milner type inference, the types of all variables can be inferred, even complicated types such as that of the function cmp.

Split

fun split is implemented with a stateful closure which alternates between true and false, ignoring the input:

fun alternator {} = let val state = ref true
    in fn a => !state before state := not (!state) end

(* Split a list into near-halves which will either be the same length,
 * or the first will have one more element than the other.
 * Runs in O(n) time, where n = |xs|.
 *)
fun split xs = List.partition (alternator {}) xs

Merge

Merge uses a local function loop for efficiency. The inner loop is defined in terms of cases: when both lists are non-empty (x :: xs) and when one list is empty ([]).

This function merges two sorted lists into one sorted list. Note how the accumulator acc is built backwards, then reversed before being returned. This is a common technique, since 'a list is represented as a linked list; this technique requires more clock time, but the asymptotics are not worse.

(* Merge two ordered lists using the order cmp.
 * Pre: each list must already be ordered per cmp.
 * Runs in O(n) time, where n = |xs| + |ys|.
 *)
fun merge cmp (xs, []) = xs
  | merge cmp (xs, y :: ys) = let
    fun loop (a, acc) (xs, []) = List.revAppend (a :: acc, xs)
      | loop (a, acc) (xs, y :: ys) =
        if cmp (a, y)
        then loop (y, a :: acc) (ys, xs)
        else loop (a, y :: acc) (xs, ys)
    in
        loop (y, []) (ys, xs)
    end

Mergesort

The main function:

fun ap f (x, y) = (f x, f y)

(* Sort a list in according to the given ordering operation cmp.
 * Runs in O(n log n) time, where n = |xs|.
 *)
fun mergesort cmp [] = []
  | mergesort cmp [x] = [x]
  | mergesort cmp xs = (merge cmp o ap (mergesort cmp) o split) xs

Quicksort

[edit]

Quicksort can be expressed as follows. fun part is a closure that consumes an order operator op <<.

infix <<

fun quicksort (op <<) = let
    fun part p = List.partition (fn x => x << p)
    fun sort [] = []
      | sort (p :: xs) = join p (part p xs)
    and join p (l, r) = sort l @ p :: sort r
    in
        sort
    end

Expression interpreter

[edit]

Note the relative ease with which a small expression language can be defined and processed:

exception TyErr;

datatype ty = IntTy | BoolTy

fun unify (IntTy, IntTy) = IntTy
  | unify (BoolTy, BoolTy) = BoolTy
  | unify (_, _) = raise TyErr

datatype exp
    = True
    | False
    | Int of int
    | Not of exp
    | Add of exp * exp
    | If  of exp * exp * exp

fun infer True = BoolTy
  | infer False = BoolTy
  | infer (Int _) = IntTy
  | infer (Not e) = (assert e BoolTy; BoolTy)
  | infer (Add (a, b)) = (assert a IntTy; assert b IntTy; IntTy)
  | infer (If (e, t, f)) = (assert e BoolTy; unify (infer t, infer f))
and assert e t = unify (infer e, t)

fun eval True = True
  | eval False = False
  | eval (Int n) = Int n
  | eval (Not e) = if eval e = True then False else True
  | eval (Add (a, b)) = (case (eval a, eval b) of (Int x, Int y) => Int (x + y))
  | eval (If (e, t, f)) = eval (if eval e = True then t else f)

fun run e = (infer e; SOME (eval e)) handle TyErr => NONE

Example usage on well-typed and ill-typed expressions:

val SOME (Int 3) = run (Add (Int 1, Int 2)) (* well-typed *)
val NONE = run (If (Not (Int 1), True, False)) (* ill-typed *)

Arbitrary-precision integers

[edit]

The IntInf module provides arbitrary-precision integer arithmetic. Moreover, integer literals may be used as arbitrary-precision integers without the programmer having to do anything.

The following program implements an arbitrary-precision factorial function:

fact.sml
fun fact n : IntInf.int = if n = 0 then 1 else n * fact (n - 1);

fun printLine str = TextIO.output (TextIO.stdOut, str ^ "\n");

val () = printLine (IntInf.toString (fact 120));
bash
$ mlton fact.sml
$ ./fact
6689502913449127057588118054090372586752746333138029810295671352301
6335572449629893668741652719849813081576378932140905525344085894081
21859898481114389650005964960521256960000000000000000000000000000

Partial application

[edit]

Curried functions have many applications, such as eliminating redundant code. For example, a module may require functions of type a -> b, but it is more convenient to write functions of type a * c -> b where there is a fixed relationship between the objects of type a and c. A function of type c -> (a * c -> b) -> a -> b can factor out this commonality. This is an example of the adapter pattern.[citation needed]

In this example, fun d computes the numerical derivative of a given function f at point x:

- fun d delta f x = (f (x + delta) - f (x - delta)) / (2.0 * delta)
val d = fn : real -> (real -> real) -> real -> real

The type of fun d indicates that it maps a "float" onto a function with the type (real -> real) -> real -> real. This allows us to partially apply arguments, known as currying. In this case, function d can be specialised by partially applying it with the argument delta. A good choice for delta when using this algorithm is the cube root of the machine epsilon.[citation needed]

- val d' = d 1E~8;
val d' = fn : (real -> real) -> real -> real

The inferred type indicates that d' expects a function with the type real -> real as its first argument. We can compute an approximation to the derivative of at . The correct answer is .

- d' (fn x => x * x * x - x - 1.0) 3.0;
val it = 25.9999996644 : real

Libraries

[edit]

Standard

[edit]

The Basis Library[7] has been standardized and ships with most implementations. It provides modules for trees, arrays, and other data structures, and input/output and system interfaces.

Third party

[edit]

For numerical computing, a Matrix module exists (but is currently broken), https://www.cs.cmu.edu/afs/cs/project/pscico/pscico/src/matrix/README.html.

For graphics, cairo-sml is an open source interface to the Cairo graphics library. For machine learning, a library for graphical models exists.

Implementations

[edit]

Implementations of Standard ML include the following:

Standard

  • HaMLet: a Standard ML interpreter that aims to be an accurate and accessible reference implementation of the standard
  • MLton (mlton.org): a whole-program optimizing compiler which strictly conforms to the Definition and produces very fast code compared to other ML implementations, including backends for LLVM and C
  • Moscow ML: a light-weight implementation, based on the Caml Light runtime engine which implements the full Standard ML language, including modules and much of the basis library
  • Poly/ML: a full implementation of Standard ML that produces fast code and supports multicore hardware (via Portable Operating System Interface (POSIX) threads); its runtime system performs parallel garbage collection and online sharing of immutable substructures.
  • Standard ML of New Jersey (smlnj.org): a full compiler, with associated libraries, tools, an interactive shell, and documentation with support for Concurrent ML
  • SML.NET: a Standard ML compiler for the Common Language Runtime with extensions for linking with other .NET framework code
  • ML Kit Archived 2016-01-07 at the Wayback Machine: an implementation based very closely on the Definition, integrating a garbage collector (which can be disabled) and region-based memory management with automatic inference of regions, aiming to support real-time applications

Derivative

Research

All of these implementations are open-source and freely available. Most are implemented themselves in Standard ML. There are no longer any commercial implementations; Harlequin, now defunct, once produced a commercial IDE and compiler called MLWorks which passed on to Xanalys and was later open-sourced after it was acquired by Ravenbrook Limited on April 26, 2013.

Major projects using SML

[edit]

The IT University of Copenhagen's entire enterprise architecture is implemented in around 100,000 lines of SML, including staff records, payroll, course administration and feedback, student project management, and web-based self-service interfaces.[8]

The proof assistants HOL4, Isabelle, LEGO, and Twelf are written in Standard ML. It is also used by compiler writers and integrated circuit designers such as ARM.[9]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Standard ML (SML) is a general-purpose, language that is statically typed with automatic , emphasizing safety through compile-time type checking and garbage collection, while supporting higher-order functions, polymorphic types, recursive datatypes with , exceptions, and a sophisticated module system for and separate compilation. Originating as the Meta Language (ML) in the late 1970s at the University of Edinburgh, SML was developed by Robin Milner and his colleagues as part of the LCF (Logic for Computable Functions) theorem prover to provide a reliable meta-language for formal proofs, drawing inspiration from earlier works like Peter Landin's ISWIM and John McCarthy's Lisp variants. The language evolved through collaborative efforts in the 1980s, with key contributions including David MacQueen's 1983 proposal for a module system and Mads Tofte's 1987 work on polymorphic references, leading to initial design meetings starting in 1982 and the first formal definition in 1986. Standardization efforts culminated in The Definition of Standard ML published in 1990 by Milner, Mads Tofte, Robert Harper, and David MacQueen, which provided a rigorous mathematical semantics with 196 inference rules for type checking, ensuring properties like type safety and deterministic evaluation. The revised edition in 1997 refined this to 189 rules, introducing the value restriction on type polymorphism to prevent unsoundness and improving module semantics with opaque signature matching, while also standardizing a Basis Library as part of the 1997 revision with 67 interfaces for portability across implementations. SML's type system uses Hindley-Milner polymorphism with unification-based , allowing concise code without explicit annotations, and supports imperative features like mutable references and arrays alongside pure functional constructs, making it suitable for both research and practical applications such as theorem proving, compilers, and . Notable implementations include SML/NJ (from , starting 1986), MLton (optimizing compiler), and Poly/ML, which have advanced features like proposed by Tofte and Jean-Pierre Talpin in 1994. Despite stalled efforts for a successor like ML2000 in the , SML remains influential in programming language research, influencing languages like , , and Scala through its innovations in and modularity.

History

Origins in the ML Language Family

The ML language family originated in the 1970s at the as the Meta Language (ML) for the LCF (Logic for Computable Functions) theorem prover, a system designed to support interactive proof construction with a focus on logical security and mechanized reasoning. Developed under the leadership of , the project began around 1973 when Milner, then at , adapted and extended an earlier LCF implementation from Stanford to create a more robust framework for proofs. The initial ML was an untyped language inspired by the and , serving primarily as a scripting tool for defining proof tactics within LCF. Key contributors to ML's early design included Milner, along with Michael J. C. Gordon, Christopher P. Wadsworth, and others in the group, who formalized ML's structure in their work on LCF. A pivotal innovation during 1973–1978 was the introduction of polymorphic , independently rediscovered by Milner and presented in his 1978 paper, which established the Hindley-Milner algorithm for principal type schemes and let-polymorphism. This advancement transformed ML from an untyped meta- into a statically typed functional language, emphasizing while preserving expressiveness for theorem-proving tasks. In the 1980s, ML evolved into standalone typed variants as its utility extended beyond LCF. Edinburgh ML emerged in 1981 through Luca Cardelli's development of a compiler for a free-standing version, initially called "ML under VMS," which decoupled ML from the LCF system and enabled broader experimentation. Concurrently, Cambridge ML was developed at the University of Cambridge, building on Edinburgh's foundations to support proof assistants like HOL and later influencing the creation of Caml in 1985. These implementations marked ML's transition to a general-purpose functional programming language. The rapid proliferation of ML dialects, including LCF/ML, Cardelli ML, Poly/ML (1988), and others, led to significant divergences in syntax, semantics, and features, resulting in portability issues that hindered collaborative development and widespread adoption. Without a unified specification, programs written for one implementation often failed in another, prompting initial efforts toward standardization in the early 1980s.

Standardization Process

The standardization of Standard ML emerged from efforts within the community to unify divergent implementations of the ML , with roots in discussions under the IFIP 2.1 as early as 1983. In April 1983, drafted an initial proposal for a "Standard ML," emphasizing a core language design that would serve as a portable foundation. This led to the formal formation of the SML Definition Committee following a pivotal meeting in from May 23–25, 1985, chaired by Milner and including key contributors such as Mads Tofte, Robert Harper, and David MacQueen. The committee operated under the auspices of IFIP 2.1, which focused on algorithmic languages and provided a structured framework for defining programming languages, drawing on the British research tradition centered in . Key milestones in the standardization process included the release of draft definitions that refined the language's semantics. In March 1986, the committee published a preliminary draft as ECS-LFCS-86-2 from the . This was followed in August 1987 by another draft, ECS-LFCS-87-36, which introduced formal static semantics and principal signatures to ensure and inference. These drafts facilitated community feedback and iteration, culminating in the final Definition of Standard ML, published in 1990 by and authored by Milner, Tofte, Harper, and MacQueen. The 1990 standard initially encompassed the core language, deliberately excluding modules to focus on foundational constructs like expressions, types, and functions, though the module system was integrated into the final document to support modular programming. This scope aimed to define a minimal, consistent subset that could be implemented portably across diverse systems, addressing fragmentation in earlier ML variants. The primary rationale was to promote interoperability and reliability, enabling developers to write code that would behave uniformly in implementations such as the Standard ML of New Jersey (SML/NJ) and others, thereby fostering wider adoption in research and education.

Revisions and Modern Developments

In 1997, the Standard ML community released a revised definition of the language, known as SML '97, which integrated the module system more deeply into the core language and provided clearer specifications of its semantics to address ambiguities in the original 1990 standard. This revision, detailed in The Definition of Standard ML (Revised) by Milner, Tofte, Harper, and MacQueen, simplified certain aspects of the language while maintaining , ensuring that existing programs could run unchanged. Accompanying the revised definition, the Standard ML Basis Library was formalized in 1997 as a comprehensive specification of predefined modules, promoting portability across implementations by standardizing essential functionality for systems and applications programming. This library, documented in The Standard ML Basis Library by Gansner and Reppy, includes modules for data structures, I/O, and time handling, among others, and has been integral to the language's ecosystem since its adoption. Since the 1997 revision, the core language has seen no major changes, with development focusing instead on implementation extensions and community-driven enhancements. Efforts to develop a successor like ML2000 in the late stalled, contributing to the stability of the core language since 1997. The ML Family Workshop series continues to foster this activity, with events in (held September 6 in , , co-located with ICFP) and 2025 (held October 16 in , co-located with ICFP/ 2025) discussing advancements in ML-family languages, including practical extensions for Standard ML. The SML community maintains the language through collaborative efforts, such as the SML Family GitHub repository, with ongoing support for implementations like SML/NJ. As of November 4, 2025, the latest SML/NJ release (version 110.99.9) is a bug-fix update, emphasizing stability over new features.

Language Design

Core Principles and

Standard ML (SML) is a general-purpose language designed with the foundational goals of achieving high expressiveness while ensuring safety and correctness through rigorous formal semantics. Its philosophy emphasizes a balance between theoretical foundations and practical utility, drawing directly from and to support higher-order functions and abstract mathematical structures. This approach, inspired by early systems like and LCF, prioritizes a declarative paradigm where programs are composed as expressions rather than imperative statements, promoting clarity and reducing unintended side effects. Central to SML's core principles is its strong static typing system, which leverages the Hindley-Milner framework to provide automatic and guarantee without runtime checks. Polymorphism, particularly , enables generic code that operates uniformly across types, enhancing reusability while the value restriction mechanism preserves soundness in the presence of imperative features. Referential transparency is a key tenet, ensuring that expressions evaluate predictably based on their values alone, which fosters correctness by minimizing mutable state dependencies and aiding in reasoning about program behavior. SML's design philosophy extends to modularity and abstraction, facilitating the construction of large-scale software through a sophisticated module system comprising signatures, structures, and functors. Functors act as parameterized modules, allowing reusable components that abstract over types and values, thereby supporting hierarchical organization and information hiding without compromising the language's type-safe expressiveness. Overall, these principles reflect SML's commitment to a secure, general-purpose language where formal verification aligns with intuitive programming practices.

Type System Fundamentals

Standard ML employs the Hindley-Milner type system, which enables automatic for expressions without requiring explicit type annotations, ensuring that every well-typed program has a principal type scheme that is the most general possible. This system, foundational to the language's design, guarantees that type checking is decidable and efficient, allowing compilation without runtime type errors. The core inference mechanism is based on Algorithm W, which computes principal types through a process of substitution and unification, as originally formalized for functional programs. Algorithm W performs type inference by recursively assigning fresh type variables to subexpressions and unifying them to resolve constraints arising from operations like function application. Unification, drawing from Robinson's algorithm, matches types by finding substitutions that make them identical, while incorporating an occurs check to prevent infinite type expansions, such as a type variable occurring within the type it is being unified with. For instance, in inferring the type of the identity function λx.x\lambda x. x, Algorithm W assigns a fresh variable α\alpha to xx, yielding the monotype αα\alpha \to \alpha, which is then generalized to the polymorphic scheme α.αα\forall \alpha. \alpha \to \alpha. This process ensures completeness and soundness, meaning every typable expression receives its principal type, and no incorrect typings are inferred. Polymorphic types in Standard ML are expressed through over type variables, forming type schemes that capture generality, such as α.αα\forall \alpha. \alpha \to \alpha for functions that work uniformly across types. Type variables, denoted by primes like 'a or 'b, represent unknowns that can be instantiated to concrete types during use, while a type scheme binds these variables universally to prevent ad-hoc polymorphism and enforce parametricity. In the static semantics, environments map identifiers to such schemes, allowing polymorphic values like list constructors to be reused with different element types without duplication. The provides static guarantees that eliminate runtime type errors, as all type inconsistencies are detected at through exhaustive and checking. Equality types, marked with an equality attribute (e.g., 'a with equality), restrict polymorphism to types supporting structural equality, such as integers or datatypes, enabling operations like only where defined. Abstract types, introduced via abstype declarations, hide implementation details by treating the type as opaque, exposing only specified operations and preventing direct construction or inspection outside the . These features ensure and safety without compromising the language's expressiveness. Type equivalence in Standard ML is structural for tuples and records, where two types are equal if their components match recursively by structure, allowing flexible composition without name dependencies. In contrast, datatypes employ nominal equivalence, where equality is determined by the datatype name and constructor definitions, preserving identity even if structures coincide, which supports distinct abstract interpretations of similar layouts. This hybrid approach balances expressivity with the need for identifiable types in polymorphic contexts.

Basic Constructs

Declarations and Expressions

In Standard ML, programs are constructed from declarations and expressions, forming the foundational building blocks of the language. Declarations introduce bindings for values and functions, while expressions represent computations that evaluate to values. Unlike imperative languages, Standard ML treats the entire program as a sequence of pure expressions without statements, emphasizing functional composition and . Value declarations use the val keyword to bind immutable identifiers to expressions, following the syntax val pat = exp, where pat is a (typically a simple identifier for basic bindings) and exp is an expression. These bindings are recursive if prefixed with rec, allowing mutual definitions within a single declaration, as in val rec x = e where e may reference x. Optional type variables can precede the binding, such as val 'a x = e, to explicitly scope polymorphic types to that declaration. Function declarations employ the fun keyword with syntax fun f pat1 ... patn = exp, defining a function f that takes arguments matching the patterns and returns the value of exp; multiple clauses can be chained with | for pattern alternatives, and the rec modifier enables recursive or mutually recursive functions. These forms ensure all bindings are statically checked for type consistency during elaboration. Expressions in Standard ML are categorized into atomic and compound forms. Atomic expressions include literals like integers (42), strings ("hello"), and identifiers (x), as well as constructs like record literals {field = value} or tuple constructions. Compound expressions build upon these through application (exp1 exp2), type ascription (exp : ty), or more complex forms such as conditional if exp1 then exp2 else exp3, which evaluates to exp2 if exp1 is true and exp3 otherwise. The let expression, let dec in exp end, introduces local declarations dec whose bindings are visible only within exp, providing a mechanism for scoped variable introduction without global pollution. Additionally, case exp of match dispatches on the value of exp using pattern matches, though basic usage here focuses on simple exhaustive cases without advanced destructuring. All expressions are pure, meaning they have no side effects unless explicitly involving mutable structures. Scoping in Standard ML follows lexical rules, where bindings introduced by declarations are visible from their point of definition onward, unless nested within a let expression, which creates a new environment for its body. In a let dec in exp end, the declarations in dec shadow outer bindings but do not affect the surrounding scope, enforcing hygiene and preventing unintended captures. Type variables in value or function bindings are explicitly scoped to that declaration, avoiding leakage into broader contexts. This design supports modular programming by isolating local computations. Standard ML programs are structured as sequences of top-level declarations, either in source files or interactively via a read-eval-print loop (REPL). A file consists of declarations separated implicitly by their forms, with the entire program evaluating as the sequence of bound values; there are no explicit statements or outside expressions. In the REPL, each declaration is elaborated and evaluated immediately, binding new identifiers into the persistent environment. This expression-centric model, devoid of imperative statements, aligns with the language's functional , where computation proceeds through value substitutions. For example, a simple program might declare a value and use it in an expression:

val pi = 3.14159; val circumference = fn r => 2.0 * pi * r; circumference 5.0;

val pi = 3.14159; val circumference = fn r => 2.0 * pi * r; circumference 5.0;

Here, pi is bound immutably, circumference defines a function, and the final application evaluates to approximately 31.4159, demonstrating how declarations feed into expressive computations. automatically deduces types like real for these reals, though explicit annotations can be added for clarity.

Functions and Lambda Expressions

In Standard ML, functions are first-class citizens, meaning they can be defined anonymously, passed as arguments, returned as results, and stored in data structures, enabling powerful functional abstractions. The core mechanism for creating functions is the abstraction, which allows for concise expression of computations without naming. Anonymous functions are defined using the syntax fn pat => exp, where pat is a (often a simple variable) and exp is the body expression; this evaluates to a value of function type, represented internally as a closure containing the pattern matches, the static environment, and an empty dynamic environment. For example, fn x => x * 2 defines a function that doubles its input, with inferred type int -> int. Standard ML employs by default, transforming multi-argument functions into a chain of single-argument functions of type τ₁ → (τ₂ → ... → τₙ → τ), which facilitates . Thus, a function like fun add x y = x + y has type int -> int -> int, and applying it partially as (add 3) yields a new function of type int -> int that adds 3 to its argument. This implicit supports flexible composition, where add 3 4 evaluates to 7, but add 3 can be bound to a variable for reuse. Recursive functions are introduced using the rec keyword in value bindings, allowing a function to refer to itself within its definition. The syntax fun rec f pat = exp (or equivalently val rec f = fn pat => exp) binds f to a recursive closure, where the expression must be a lambda abstraction to enable self-reference. For instance, fun rec factorial n = if n = 0 then 1 else n * factorial (n - 1) computes the with base case and recursive step, inferring type int -> int. Mutual recursion between multiple functions is supported by chaining bindings with and, such as val rec even = fn n => n = 0 orelse odd (n - 1) and odd = fn n => n <> 0 andalso even (n - 1), enabling definitions where functions call each other. Function application in Standard ML uses strict call-by-value semantics, where arguments are fully evaluated from left to right before substitution into the function body. This ensures deterministic behavior and side-effect ordering, as in (fun f x = x + 1) (2 + 3), where 2 + 3 evaluates to 5 prior to application, yielding 6. Unlike lazy evaluation, this strict strategy prevents unevaluated thunks, promoting efficiency for most computations while requiring explicit handling for infinite data structures if needed.

Built-in Types and Type Synonyms

Standard ML provides a set of primitive built-in types that form the foundation of its , including the monomorphic types int for integers, real for floating-point numbers, string for sequences of characters, bool for boolean values (true and false), char for individual characters, and unit as a singleton type with value (). These types have arity zero and are part of the initial static basis, with int, bool, string, char, and unit admitting equality comparisons via the = and <> operators, while real does not due to its non-discrete nature. Among the built-in types, lists and tuples are constructed using predefined type constructors and syntactic sugar for convenience. Lists are parameterized by the type constructor list of arity one, yielding types like 'a list for homogeneous sequences, constructed via the nil value for empty lists and the infix :: operator for cons cells, with derived syntax [e1; e2; ...; en] for list literals where all elements share the same type. Tuples, on the other hand, are derived from record types and expressed as (e1, e2, ..., en) for n ≥ 2, equivalent to labeled records {1=e1, 2=e2, ..., n=en} with fixed, anonymous labels, allowing heterogeneous components of arbitrary types. Both lists and tuples admit equality if all their component types do, enabling structural comparisons and pattern matching on equality types. Type synonyms in Standard ML allow developers to create aliases for existing types without introducing new type constructors, using the declaration type tyvarseq tycon = ty to name complex types transparently. For example, one might define type intlist = int list to abbreviate lists of integers, or more generally type 'a slist = 'a * 'a list for a pairing an element with a list; such synonyms are fully interchangeable with their underlying types and preserve properties like equality admissibility. This mechanism supports polymorphism by allowing parametric synonyms, such as type 'a seq = 'a list, facilitating readable code without altering the type system's . Equality types play a crucial role in Standard ML, denoting those types for which structural equality is defined and usable in expressions like e1 = e2 or in pattern matching, including all primitive types except real and built-in constructors like list and tuples when their arguments admit equality. This distinction ensures safe operations, as non-equality types like real avoid imprecise floating-point comparisons in patterns or equality tests. Built-in type constructors extend the primitives, notably ref of arity one for creating mutable references with type 'a ref, which admits equality only if the underlying type 'a does, enabling imperative features like assignment while integrating with the functional core. Other constructors like list are similarly predefined in the initial basis, providing essential polymorphic structures without requiring user definitions.

Advanced Features

Algebraic Data Types

Algebraic data types in Standard ML allow programmers to define custom sum and product types through datatype declarations, enabling the construction of complex, type-safe data structures such as trees or lists. A datatype declaration specifies a new along with one or more value constructors that define how values of the type are formed, combining variants (sums) via alternatives separated by bars and products via tuples or . For instance, a can be declared as datatype 'a [tree](/page/Tree) = [Leaf](/page/Leaf) of 'a | Node of 'a [tree](/page/Tree) * 'a [tree](/page/Tree), where Leaf wraps a single value and Node combines two subtrees. Each constructor in a datatype declaration generates a corresponding function that creates values of the type, with types inferred from the declaration; for the example, Leaf has type 'a -> 'a [tree](/page/Tree) and Node has type 'a [tree](/page/Tree) * 'a [tree](/page/Tree) -> 'a [tree](/page/Tree). These constructors are exception-safe, as they cannot produce invalid states like null pointers—instead, all values must be explicitly built using the defined constructors, preventing runtime errors from uninitialized . Standard ML supports recursive datatypes, where the type refers to itself within constructor definitions, facilitating structures like trees or the built-in type (though custom lists can also be defined for illustration). is enabled by chaining multiple datatype declarations with the and keyword, allowing types to reference each other, such as datatype even = Zero | Succ of odd and datatype odd = One | Succ of even. This mechanism ensures that is properly handled at the type level without requiring special annotations. The enforces safety by treating each datatype as a distinct new type, incompatible with others unless explicitly equated, which eliminates implicit conversions and promotes exhaustive construction via constructors. , covered elsewhere, provides a natural way to deconstruct these types, but the declaration itself guarantees that all values are well-formed.

Pattern Matching

in Standard ML provides a mechanism for destructuring composite values and binding variables to their components in a concise and type-safe manner, forming a of the language's expressive power for handling structured data. It appears primarily in expressions and function definitions, where an input value is tested against a sequence of patterns, and the body associated with the first successful is evaluated. This approach enables declarative specification of behavior based on data shape, avoiding explicit type tests or conditionals. The primary construct for is the expression, with syntax case exp of pat_1 => exp_1 | ... | pat_n => exp_n, where exp is evaluated to a value that is sequentially matched against the patterns pat_i. The first that binds any variables in it to corresponding subvalues from exp, after which the associated exp_i is evaluated in this extended environment. If no , the built-in Match exception is raised. A wildcard _ can be used to match any value without binding variables, often serving as a default case. For example, the following computes the length of a list:

fun length l = case l of [] => 0 | _ :: rest => 1 + length rest

fun length l = case l of [] => 0 | _ :: rest => 1 + length rest

Here, [] matches an empty list, while _ :: rest matches a non-empty list, binding rest to its tail. Standard ML supports a rich set of pattern forms to accommodate various data structures. Variables serve as simple patterns that bind to the entire matched value, such as x in case e of x => .... Constructor patterns apply a data constructor to subpatterns, like Cons(h, t) for a cons cell, destructuring the value into head h and tail t. Tuple patterns (pat_1, ..., pat_n) match n-tuples by binding each component to the corresponding subpattern, as in case p of (x, y) => x + y. List patterns include literals like [pat_1, ..., pat_n] for fixed-length lists or cons forms pat_1 :: pat_2 for recursive decomposition. As-patterns, written pat as vid, bind vid to the full matched value while simultaneously decomposing it via pat; for instance, p as (d, _) in a dictionary update function binds p to the pair and d to its first element. This form, also known as a layered pattern, allows access to both the whole and its parts without recomputation. To ensure robustness, Standard ML compilers perform static checks on match expressions and function definitions involving patterns. Exhaustiveness checking verifies that the patterns cover all possible values of the scrutinized expression's type, issuing a warning if incomplete; for algebraic data types, this involves analyzing constructor coverage. If the match is inexhaustive at runtime, the Match exception is raised non-exhaustively. Redundancy checking detects and warns about superfluous rules that cannot be reached due to prior patterns, such as a clause following a wildcard. These analyses promote reliable code by catching common errors early, though they are advisory rather than errors, allowing compilation to proceed.

Higher-Order Functions and Modules

Standard ML supports higher-order functions, which treat functions as first-class citizens that can be passed as arguments to other functions, returned as results, or stored in data structures. This capability enables powerful abstractions for composing computations and processing collections like lists. A canonical example is the map function from the Basis Library's List structure, which applies a given function to each element of a list, producing a new list of transformed elements. Its type is ('a -> 'b) -> 'a list -> 'b list, allowing polymorphic transformation without specifying the exact function in advance. For instance:

- List.map (fn x => x * 2) [1, 2, 3]; > val it = [2, 4, 6] : int list

- List.map (fn x => x * 2) [1, 2, 3]; > val it = [2, 4, 6] : int list

This promotes reusable code for processing. Reduction operations like foldl and foldr further exemplify higher-order functions by accumulating a value over a using a binary operator. The foldl function performs a left-to-right fold with type ('a * 'b -> 'b) -> 'b -> 'a [list](/page/List) -> 'b, starting from an initial accumulator and applying the operator to each element. Similarly, foldr folds right-to-left with the same type signature, enabling efficient computations such as summing a :

- List.foldl (op +) 0 [1, 2, 3]; > val it = 6 : int

- List.foldl (op +) 0 [1, 2, 3]; > val it = 6 : int

These functions are essential for expressing recursive patterns generically. The module system in Standard ML provides mechanisms for large-scale abstraction through structures, signatures, and functors, allowing code to be organized into modular units with controlled interfaces. Structures encapsulate a collection of value, type, and submodule bindings, forming a self-contained unit. For example, a structure might define a queue abstraction:

structure Queue = struct type 'a queue = int * 'a list * 'a list val empty = (0, [], []) (* additional bindings *) end

structure Queue = struct type 'a queue = int * 'a list * 'a list val empty = (0, [], []) (* additional bindings *) end

Signatures specify the interface of a structure, declaring the types, values, and substructures it must provide, ensuring type-safe modular composition. A corresponding signature might be:

signature QUEUE = sig type 'a queue val empty : 'a queue (* additional specifications *) end

signature QUEUE = sig type 'a queue val empty : 'a queue (* additional specifications *) end

A can be ascribed to a for verification, such as structure Q :> QUEUE = Queue. Functors extend by parameterizing over other matching a given , enabling . The syntax is functor F (S : SIG) = struct ... end, where SIG is the parameter and the body uses S. application instantiates the parameter, generating a new : F (structure Arg = SomeStruct). This produces a result whose bindings are elaborated based on the body, supporting reusable abstractions like generic dictionaries parameterized over ordered keys. To enforce , Standard ML uses opaque (sealed) signatures, which hide implementation details by preventing access to types beyond the interface. Opaque ascription, via :> SIG, matches a to a but renames non-specified types to abstract ones, breaking external equalities. Transparent ascription (: SIG) preserves more information. constraints in signatures or functor applications enforce equality of types or structures across modules, such as sharing type T1 = T2 to ensure consistent type identities. The open declaration imports all bindings from a structure into the current namespace, facilitating convenient access without qualification, as in open Queue to use empty directly. This aids readability while maintaining modularity.

Imperative Programming

Mutable State with References

Standard ML provides limited support for imperative programming through mutable references, which serve as the primary mechanism for introducing state into an otherwise purely functional language. A reference is created using the ref constructor applied to an expression e, resulting in a mutable cell initialized with the value of e; the type of this reference is 'a ref where e has type 'a. Dereferencing occurs with the prefix operator ! applied to a reference r, yielding the current value stored in the cell, of type 'a if r has type 'a ref. Assignment to a reference uses the infix operator :=, as in r := e, which updates the cell with the value of e (requiring e to have type 'a) and returns unit. These operations enable side effects while integrating seamlessly with the type system, ensuring that mutations are type-safe. References in Standard ML exhibit polymorphism in their content type, allowing a single reference type 'a ref to accommodate any instantiation of 'a. However, the value restriction imposes limitations to maintain , preventing polymorphic generalization of created in expansive expressions (those involving function applications or other computations). For instance, a declaration like val r = ref 42 yields a monomorphic type int ref, as the expression ref 42 is expansive and does not generalize to forall 'a. 'a ref. In contrast, lambda abstractions such as fn x => ref x can achieve polymorphic types like forall 'a. 'a -> 'a ref because they are non-expansive values. This restriction avoids unsoundness, such as mutating a polymorphic with incompatible types, and ensures that allocated on the heap only escape with concrete types upon function return. Imperative control structures in Standard ML are derived from functional primitives combined with references. Sequencing of expressions uses the infix operator ;, where e1; e2 evaluates e1 for its side effects (discarding its value if not unit) and then returns the value of e2, with the overall type matching that of e2. While loops, absent as a primitive in the core language, are implemented using recursive functions and references to maintain mutable state across iterations. For example, a counter-based loop might use a reference to track progress:

sml

let val count = ref 0 fun loop () = if !count < 10 then (count := !count + 1; loop ()) else () in loop () end

let val count = ref 0 fun loop () = if !count < 10 then (count := !count + 1; loop ()) else () in loop () end

Here, the recursive loop function tests the condition on the dereferenced reference, updates it via assignment, and recurses, leveraging tail-call optimization for efficiency. Such constructs simulate imperative loops while relying on the language's recursive binding form rec (or fun for top-level recursion) to define the looping function. This design positions references as a minimal extension to the pure functional core of Standard ML, confining mutations to explicit imperative constructs without polluting the declarative semantics. The type system tracks reference mutability precisely, preventing unintended aliasing or type inconsistencies, and ensures that pure expressions remain unaffected by distant side effects. As a result, most Standard ML programs can leverage functional purity for reasoning and compositionality, resorting to references only where imperative state is essential, such as in performance-critical sections or interfaces to external systems.

Exceptions and Error Handling

Standard ML provides an exception mechanism for handling runtime errors and enabling non-local control transfers, allowing programs to signal and recover from exceptional conditions without relying on return codes or global state. Exceptions are values of the built-in abstract datatype exn, which is extensible and cannot be constructed directly outside of exception declarations, ensuring type safety by preventing accidental confusion with ordinary values. This design supports polymorphic payloads, where an exception can carry data of arbitrary type, while maintaining a uniform handling interface through pattern matching. Exceptions are declared using the exception keyword, introducing a new variant constructor for the exn type. A simple exception without payload is declared as exception E, yielding a value of type exn. For exceptions carrying data, the syntax exception E of τ specifies a payload of type τ, resulting in an exception value of type τ -> exn; for example, exception Fail of string allows raising a string message. Multiple exceptions can be declared in a single statement using and, and exceptions may be aliased via exception E = E'. These declarations extend the global exception namespace and are visible throughout the program. To raise an exception, the raise expression is used: raise exp, where exp evaluates to an exn value, immediately terminating the current evaluation and propagating the exception upward through the call stack. For instance, raise Fail "error occurred" signals a failure with a descriptive message. Propagation continues until a handler matches the exception or the program terminates, providing a clean way to interrupt normal flow for error conditions. Handling occurs via the handle construct in expressions: exp handle match, where match consists of pattern-matching rules of the form pat => exp1 | .... The expression exp is evaluated first; if it succeeds, the handler is ignored. If it raises an exception e, the patterns are matched against e—using the same as in case expressions—and the first successful match evaluates to the corresponding exp_i. Unmatched exceptions propagate further up the stack. This mechanism allows selective recovery, such as logging an error and continuing with a default value. Standard ML includes several built-in exceptions in the initial basis, raised automatically by language constructs or library functions for common errors:
  • Bind: Raised on pattern-binding failures, such as in non-exhaustive val bindings.
  • Match: Raised on non-exhaustive pattern matches in case expressions or function clauses.
  • Overflow: Raised for arithmetic operations exceeding representable bounds.
  • Div: Raised for division by zero.
  • Interrupt: Raised on user interrupts, like Ctrl+C in interactive environments.
These exceptions are of type exn (or polymorphic variants where applicable) and can be handled like user-defined ones, with functions like exnName and exnMessage providing string representations for debugging. The polymorphic nature ensures flexibility—for example, Fail carries a string—while the distinct exn type prevents exceptions from being used as regular data, enforcing safe error handling. The following example demonstrates declaration, raising, and handling:

exception DivideByZero; fun safeDiv (x, y) = if y = 0 then raise DivideByZero else x div y; val result = safeDiv (10, 0) handle DivideByZero => 0;

exception DivideByZero; fun safeDiv (x, y) = if y = 0 then raise DivideByZero else x div y; val result = safeDiv (10, 0) handle DivideByZero => 0;

Here, safeDiv raises DivideByZero on zero division, but the handler catches it and returns 0, illustrating recovery without stack unwinding beyond the try scope (noting that handle is the SML equivalent of a try-catch).

Standard Library

Basis Library Overview

The Standard ML Basis Library is a standardized collection of modules that forms the core initial environment for SML programs, ensuring portability and consistency across implementations. Defined as part of the 1997 revision of the SML standard (SML'97), it specifies a minimal set of required modules and signatures to promote interoperable code that behaves predictably regardless of the underlying system. This library was developed to address the limitations of earlier SML versions by providing a rich, general-purpose foundation for applications, with an emphasis on idioms and type-safe operations. The library is organized hierarchically into structures and signatures, where structures implement the functionality and signatures define the interfaces to guarantee consistent behavior. Conformance to the standard requires implementations to provide all mandatory modules, such as those for basic data types and operations, while optional modules extend capabilities for specific environments like POSIX. Examples of core modules include Int for integer handling, List for list manipulations, and String for string operations, each adhering to predefined signatures like INTEGER, LIST, and STRING to enforce uniformity. This structure supports the portability goal by abstracting platform-specific details, allowing developers to write code that compiles and runs identically on compliant systems. Key modules cover essential areas, including general-purpose facilities like (I/O) and . For I/O, structures such as TextIO and BinIO provide text and binary stream operations, enabling reading from and writing to files or in a type-safe manner. The Time structure handles time intervals and timestamps, supporting operations like and comparison of time values for timing applications. In integer arithmetic, the Int module (implementing the INTEGER ) offers operations such as (+), multiplication (*), and division (div), with support for both signed and unsigned variants through related modules like LargeInt. For lists, the List structure includes higher-order functions like append (concatenating lists) and rev (reversing a list), facilitating efficient functional compositions without mutable state. Usage of the Basis Library typically involves implicit access to pervasive elements in the top-level environment, such as the int type or the length function, while specific modules are imported explicitly for clarity and namespace management. Developers can use open List; to bring all identifiers from the List structure into scope, or qualify names like List.map to avoid conflicts in larger programs. This approach aligns with SML's module system, allowing seamless integration of Basis components into user-defined modules for building portable applications.

Specialized Utility Modules

The Standard ML Basis Library includes several specialized modules that extend the core functionality for operations, mathematical computations, date and time handling, operating system interactions, character and manipulations, and low-level data structures. These modules provide domain-specific utilities that enable programmers to perform tasks such as file handling, trigonometric calculations, and bitwise operations while maintaining the language's and functional paradigm.

I/O Modules

The TextIO structure facilitates text-based input and output using streams of characters and . It defines types such as instream for input streams and outstream for output streams, supporting operations like openIn to open a file for reading, inputLine to read a newline-terminated line (returning SOME([string](/page/String)) or NONE at end-of-stream), and output to write a to an output stream. Additional functions include openOut for creating or truncating files, openAppend for appending to files, and print for outputting to the standard output with automatic flushing; streams are block-buffered by default for files and line-buffered for interactive devices. Complementing TextIO, the BinIO structure handles binary input and output of 8-bit bytes (Word8.word elements). It provides similar stream types (instream and outstream) and functions such as openIn, openOut, and openAppend, inheriting imperative I/O semantics for reading (input, inputN) and writing (output, outputSubVec) . This module is essential for low-level file operations where byte-level precision is required, with error handling via the Io exception for failures like non-existent files.

Math and Utility Modules

The Math structure offers fundamental mathematical constants and functions operating on the real type, adhering to IEEE 754-1985 floating-point semantics. Key constants include pi (approximately 3.14159) and e (approximately 2.71828), while functions encompass sin, cos, and tan for trigonometric operations in radians, sqrt for square roots (returning NaN for negative inputs), exp for exponentiation, ln and log10 for logarithms (NaN for non-positive arguments), and pow for general exponentiation, returning NaN for negative bases with non-integer exponents and handling other edge cases per IEEE 754 semantics. These operations support scientific computing and numerical algorithms within SML programs. For calendar and time management, the Date structure provides tools for representing and manipulating dates in a specific time zone. It defines an abstract date type constructed via the date function from components like year, month, day, hour, minute, second, and optional offset; conversion functions include fromTimeLocal and fromTimeUniv to derive dates from UTC times, and toTime for the reverse. Formatting is handled by fmt using specifiers (e.g., %Y for four-digit year) and toString for a standard 24-character representation, while scanning parses dates via scan from streams or fromString from strings, returning options for error cases. This module integrates with the Time structure for comprehensive datetime operations. The OS structure serves as a facade for operating system interactions, encapsulating substructures like OS.FileSys for file and directory , OS.IO for polling, OS.Path for pathname syntax, and OS.Process for process control and environment access. It declares the SysErr exception for runtime errors and provides utilities such as errorMsg to retrieve descriptive strings for system errors, errorName for unique identifiers, and syserror for converting error strings to optional codes. These facilities enable portable syscalls, such as file attribute queries in OS.FileSys or process termination in OS.Process, abstracting platform-specific details.

Char and String Modules

The Char structure supplies predicates and conversions for individual characters, treating them as 8-bit values with operations like ord and chr for mappings (raising Chr for out-of-range inputs), toLower and toUpper for case conversion, and toString for escaped representations. Predicates include isDigit to check for digits ('0' to '9'), isAlpha for letters, isAlphaNum for alphanumeric characters, isSpace for whitespace, and isCntrl for control characters, facilitating text validation and processing. Building on Char, the String structure manages immutable sequences of characters with functions for conversion and extraction. Conversions include str to create a single-character string and implode to concatenate a character list into a string, while explode reverses this by producing a char list. Substring operations feature sub to access the i-th character (zero-based), substring to extract a contiguous segment of length j starting at i, and extract for substrings from i to the end or a specified length; these support efficient string manipulation without mutation.

Extensions in Basis

The CharArray structure extends array facilities for mutable sequences of characters, providing low-level data handling akin to the general but specialized for chars. It includes array to create an array of given length initialized with a char, fromList for list-to-array conversion, sub for indexed access, update for modification, and vector to convert to an immutable vector, useful for performance-critical text buffering. Similarly, the Word structure supports unsigned machine-word integers with fixed precision (wordSize bits), enabling bitwise and arithmetic operations for low-level data. Key features include modular arithmetic via +, -, *, div, and mod; bitwise functions like andb, orb, xorb, and notb; shifts with << (left), >> (logical right), and ~>> (arithmetic right); and conversions such as toInt/fromInt and toLarge/fromLarge for interfacing with other numeric types, ideal for and .

Implementations

SML/NJ

Standard ML of New Jersey (SML/NJ) is a prominent implementation of the Standard ML '97 programming language, originally developed in spring 1986 by David MacQueen at Bell Laboratories and Andrew Appel at Princeton University. It emerged as a dedicated compiler aimed at producing highly optimized native code, evolving from earlier ML systems and serving as the primary reference implementation due to its comprehensive support for the language definition and ongoing maintenance by a core group including MacQueen and John Reppy. Key features of SML/NJ include its interactive read-eval-print loop (REPL), accessible via the sml command, which enables incremental development, of modules, and immediate execution of expressions in an environment with full and garbage collection. The system supports compilation to native through the Compilation Manager (CM), which uses dependency analysis to minimize recompilation and leverages the MLRISC backend for targets like AMD64. Beyond strict conformance to Standard ML '97, SML/NJ offers extensions such as first-class continuations via the Cont structure and suspend/resume mechanisms provided by the Susp structure, facilitating advanced control abstractions like coroutines. For performance, SML/NJ employs a two-level bootstrap compiler process, where an initial lightweight builds the full , enabling efficient self-hosting and generation of high-quality code across platforms. The latest release, version 110.99.9 from November 4, 2025, focuses on bug fixes and stability improvements, including enhancements to 64-bit support on macOS and . SML/NJ is distributed as free, under the BSD 3-Clause License and is cross-platform, with official support for systems (including ), Windows, and macOS, including recent additions for Arm64 architectures. Source distributions and installers are available via and the official website, ensuring accessibility for developers and researchers.

MLton

MLton is a whole-program for the Standard ML programming language, designed to generate small, standalone executables with high runtime performance and no dependency on an interactive . Unlike interactive environments, MLton performs a complete and optimization of the entire program before code generation, enabling aggressive optimizations that eliminate overhead from higher-order features and produce native . This approach results in executables that are self-contained, leveraging techniques such as unboxed native representations for integers, reals, and arrays, as well as fast via GMP integration. The compiler's optimization pipeline includes several advanced transformations to improve efficiency. Key steps involve defunctionalization and closure conversion, which translate higher-order functional code into a first-order static single assignment (SSA) form, eliminating closures and enabling subsequent optimizations like constant propagation and dead code elimination. Additional optimizations encompass multiple code generation backends and garbage collection strategies, allowing selection based on target architecture for minimal memory and execution overhead. MLton also supports built-in profiling capabilities, including allocation, time, and stack profiling via the mlprof tool, which instruments code to measure function-level metrics without significant performance degradation in non-profiled builds. MLton adheres strictly to the SML'97 standard, ensuring full conformance to the revised Definition of Standard ML while providing extensions such as a C (FFI) for integrating with system libraries and utilities like lexer and parser generators (mllex and mlyacc). The project originated in 1997 as an independent SML compiler effort, with the first release under the name MLton in 1999, and has been open-source throughout its development. As of 2025, MLton remains actively maintained, with the latest release (20241230) in December 2024 and ongoing updates to support modern platforms and enhancements for applications.

Poly/ML

Poly/ML is a high-performance implementation of Standard ML, originally developed by David Matthews at the University of 's Computer Laboratory. It derives its name from an experimental language called Poly, in which the initial version was written, and builds on earlier efforts like Cambridge ML to provide efficient compilation and runtime support for large-scale programming. Designed with a focus on , Poly/ML emphasizes for demanding applications through advanced runtime features. A core strength of Poly/ML lies in its support for parallelism, achieved via native operating system threads that leverage primitives such as forks, mutexes, and condition variables. This enables full multiprocessor utilization, with the thread library and garbage collector both optimized for concurrent execution on shared-memory systems. While is not natively built-in, the implementation facilitates high-level parallelism abstractions, such as values in tools like Isabelle/ML, for implicit scheduling and parallel evaluation. Additionally, its garbage collector is engineered for efficiency with large heaps, accommodating memory-intensive workloads up to the terabyte scale through 64-bit addressing and dynamic heap adjustment. Poly/ML achieves full conformance to the Standard ML '97 specification, including the complete Basis Library, ensuring portability of Standard ML code across implementations. Beyond the standard, it introduces extensions primarily within the PolyML structure, such as the compiler for foreign calls that generate executable code from SML sources and export to object files, and the Profiling submodule for measuring execution time and memory allocation. Other enhancements include explicit garbage collection triggers via fullGC, heap analysis functions like objSize, and thread-safe compilation to support parallel code generation. These features extend the language without compromising core compatibility, enabling integration with external systems and performance tuning. As of 2025, Poly/ML remains actively maintained, with version 5.9.2 released in August 2025 and development continuing on through community contributions. It is particularly valued in scientific computing for its robustness in handling complex, resource-heavy tasks, serving as the backend for systems like Isabelle and HOL4, where parallel proof checking and large-scale theorem proving demand efficient memory and concurrency management.

Other Notable Implementations

Moscow ML is a lightweight implementation of Standard ML, designed primarily for educational and research purposes, featuring an interactive interpreter and a compiler that generates standalone executables. It fully conforms to the 1997 revision of Standard ML, including the modules language, and implements most of the Standard ML Basis Library, though it omits functional stream I/O. Originally developed by Peter Sestoft at the , it is portable across systems and Windows, with a small footprint that facilitates quick startup times in interactive sessions, making it suitable for teaching concepts. Moscow ML includes extensions such as higher-order functors and first-class modules, which enhance its expressiveness beyond the strict standard while maintaining compatibility. MLKit is a research-oriented compiler toolkit for Standard ML, originating from the University of Aarhus and later developed at the , emphasizing as an alternative to traditional garbage collection. This approach uses region inference to allocate objects in hierarchical regions that are deallocated explicitly or automatically at region boundaries, reducing runtime overhead and enabling predictable memory usage in performance-critical applications. MLKit supports the full SML'97 language definition and a large portion of the Basis Library, compiling to native code for architectures via a . Its design prioritizes static analysis for region lifetimes, allowing programmers to influence memory behavior through annotations while preserving the language's safety guarantees. CakeML provides a verified implementation of a significant subset of Standard ML, tailored for and certified software development, with its entire compiler pipeline mechanically proven correct in the HOL4 theorem prover. This subset encompasses core features like datatypes, , higher-order functions, references, exceptions, polymorphism, modules, signatures, and arbitrary-precision integers, but excludes records, I/O operations, and functors to simplify verification. Developed collaboratively by researchers at institutions including the and the , CakeML generates certified , enabling end-to-end guarantees from source code to execution, including a bootstrapping compiler and interactive REPL. Its focus on supports applications in trustworthy systems, such as certified operating system components or theorem provers. These implementations exhibit trade-offs in performance, features, and conformance to the 1997 Standard ML definition. Moscow ML prioritizes lightweight interactivity and educational accessibility, offering full language conformance with rapid startup but less aggressive optimization compared to whole-program compilers, achieving reasonable execution speeds for interpretive use. MLKit balances research innovation in —regions can yield lower latency than garbage collection in structured programs—with complete SML'97 support, though it requires familiarity with region annotations for optimal efficiency. CakeML, while limited to a verified (approximately 80% of core SML features), ensures highest assurance levels at the cost of broader feature omission and potential performance overhead from its proof obligations, making it ideal for safety-critical domains rather than general-purpose coding. Overall, all three maintain high fidelity to the 1997 standard where applicable, with varying Basis Library coverage: Moscow ML and MLKit near-complete, and CakeML partial due to its scoped design.

Code Examples

Simple Programs

Standard ML's simple programs highlight its functional nature, concise syntax, and reliance on the Basis Library for common operations like and processing. These examples illustrate basic expression evaluation, function definitions via , and the use of higher-order functions, all without explicit type declarations thanks to the language's mechanism. A fundamental introductory program is the "Hello, World!" example, which demonstrates output using the TextIO module from the Basis Library. The following code prints a greeting to the standard output stream:

sml

TextIO.print("Hello, world!\n");

TextIO.print("Hello, world!\n");

Upon execution in an interactive environment like SML/NJ, this evaluates to unit (the type for void-like expressions) and displays "Hello, world!" followed by a . To suppress the printing of the unit result in the REPL, the expression can be bound to an underscore:

sml

val _ = TextIO.print("Hello, world!\n");

val _ = TextIO.print("Hello, world!\n");

The TextIO.print function has the signature string -> unit, applying the given directly to the output stream. Another basic example is a recursive function for computing the of a non-negative , defined using on the argument as per the core language syntax. The code is:

sml

fun fact 0 = 1 | fact n = n * fact (n - 1);

fun fact 0 = 1 | fact n = n * fact (n - 1);

This defines fact with a base case where fact 0 returns 1, and a recursive case multiplying n by fact (n - 1) for n > 0. The type is inferred as int -> int. For instance, evaluating fact 5 yields 120, demonstrating the language's support for straightforward recursion without explicit loops. Simple list operations showcase Standard ML's built-in support for functional transformations via the List structure in the Basis Library. Consider mapping a function to double each element in an integer list:

sml

val doubled = List.map (fn x => x * 2) [1, 2, 3, 4];

val doubled = List.map (fn x => x * 2) [1, 2, 3, 4];

The List.map function, with signature ('a -> 'b) -> 'a list -> 'b list, applies the anonymous function (fn x => x * 2) (of type int -> int) to each element from left to right, producing [2, 4, 6, 8] of type int list. Similarly, filtering a list to retain only even numbers uses List.filter:

sml

val evens = List.filter (fn x => x mod 2 = 0) [1, 2, 3, 4, 5, 6];

val evens = List.filter (fn x => x mod 2 = 0) [1, 2, 3, 4, 5, 6];

With signature ('a -> bool) -> 'a list -> 'a list, this applies the predicate (fn x => x mod 2 = 0) (type int -> bool) and returns [2, 4, 6] of type int list, preserving the original order of matching elements. These operations exemplify immutable list handling central to the language.

Algorithm Implementations

Standard ML's functional lends itself naturally to expressing classic sorting algorithms using , , and higher-order functions on lists. These implementations highlight the language's emphasis on immutability and declarative style, avoiding mutable state or loops. Below, representative examples of , , and are provided, demonstrating core idioms such as list cons (::) and . Efficiency considerations, including tail-recursive optimizations via accumulators, are also discussed to align with the language's support for safe, performant .

Insertion Sort

Insertion sort builds a sorted list incrementally by inserting elements into their correct positions. This showcases on lists, where the empty list serves as the base case and the operator allows recursive insertion. The insert function places a value x into a sorted list ys, comparing x with the head y of ys:

fun insert x [] = [x] | insert x (y :: ys) = if x <= y then x :: y :: ys else y :: insert x ys

fun insert x [] = [x] | insert x (y :: ys) = if x <= y then x :: y :: ys else y :: insert x ys

The full isort function then iterates over the input list, inserting each element into an accumulating sorted result:

fun isort [] = [] | isort (x :: xs) = insert x (isort xs)

fun isort [] = [] | isort (x :: xs) = insert x (isort xs)

This recursive definition is intuitive but not tail-recursive, potentially leading to stack overflow for large lists due to the pending insert call. To optimize, a tail-recursive variant uses an accumulator to build the result:

fun isort_tail xs = let fun insort acc [] = acc | insort acc (x :: xs) = insort (insert x acc) xs in insort [] xs end

fun isort_tail xs = let fun insort acc [] = acc | insort acc (x :: xs) = insort (insert x acc) xs in insort [] xs end

This approach maintains ascending order and exemplifies SML's balance of expressiveness and efficiency through higher-order functions like insert. With proper tail-call optimization in compliant implementations, it avoids stack overflow while achieving O(n²) time.

Merge Sort

Merge sort employs a divide-and-conquer strategy, splitting lists recursively and merging sorted halves. It is stable and guarantees O(n log n) time, making it suitable for demonstrating recursive decomposition in SML. First, the split function divides a list into two roughly equal parts using even-odd indexing:

fun split [] = ([], []) | split [x] = ([x], []) | split (x :: y :: xs) = let val (l, r) = split xs in (x :: l, y :: r) end

fun split [] = ([], []) | split [x] = ([x], []) | split (x :: y :: xs) = let val (l, r) = split xs in (x :: l, y :: r) end

The merge function combines two sorted lists, using pattern matching to compare heads and recurse on tails:

fun merge [] ys = ys | merge xs [] = xs | merge (x :: xs) (y :: ys) = if x <= y then x :: merge xs (y :: ys) else y :: merge (x :: xs) ys

fun merge [] ys = ys | merge xs [] = xs | merge (x :: xs) (y :: ys) = if x <= y then x :: merge xs (y :: ys) else y :: merge (x :: xs) ys

The complete msort ties these together:

fun msort [] = [] | msort [x] = [x] | msort xs = let val (l, r) = split xs in merge (msort l) (msort r) end

fun msort [] = [] | msort [x] = [x] | msort xs = let val (l, r) = split xs in merge (msort l) (msort r) end

For tail recursion, an accumulator can be introduced in the merge step, though the divide phase remains inherently non-tail-recursive. Implementations like MLton optimize this via deforestation, reducing intermediate list allocations. This example underscores SML's strength in composing pure functions for parallelizable algorithms.

Quicksort

Quicksort selects a pivot and partitions the list into elements less than or equal to and greater than the pivot, recursing on each part. In SML, partitioning uses higher-order functions like List.filter, naturally handling duplicates by including equals in one subtree for simplicity. Assuming the List structure's filter:

fun qsort [] = [] | qsort [x] = [x] | qsort xs = let val pivot = hd xs val rest = tl xs val lows = List.filter (fn y => y <= pivot) rest val highs = List.filter (fn y => y > pivot) rest in qsort lows @ pivot :: qsort highs end

fun qsort [] = [] | qsort [x] = [x] | qsort xs = let val pivot = hd xs val rest = tl xs val lows = List.filter (fn y => y <= pivot) rest val highs = List.filter (fn y => y > pivot) rest in qsort lows @ pivot :: qsort highs end

The @ operator concatenates results, with hd and tl extracting pivot and tail. This version averages O(n log n) time but worst-cases to O(n²) on sorted inputs; randomized pivot selection (e.g., via List.nth (xs, Random.randInt (length xs))) can mitigate this, though not shown here for brevity. These implementations illustrate how SML's list operations foster concise, correct algorithms without side effects.

Advanced Language Features

Standard ML provides advanced mechanisms for abstraction and error management, including modules and functors for structuring code and exceptions for handling runtime errors. These features enable the creation of reusable, generic components and robust programs that gracefully manage exceptional conditions. The language's module system, defined in the revised standard, supports through functors, allowing structures to be instantiated with different types while preserving . Exceptions, also part of the core definition, permit the interruption of normal to signal and recover from errors like arithmetic overflows or invalid operations.

Error Handling with Exceptions

Standard ML's exception system uses the raise keyword to signal errors and the handle clause to catch them, providing a form of lightweight error handling without altering the functional purity of expressions. For instance, the built-in division operator div raises the Div exception when dividing by zero, as specified in the Basis Library's structure. To demonstrate recovery, consider a safe division function that catches this exception and returns a default value:

sml

fun safeDiv (x, y) = x div y handle Div => 0;

fun safeDiv (x, y) = x div y handle Div => 0;

Evaluating safeDiv (10, 0) yields 0 instead of terminating the program, illustrating how [handle](/page/Handle) matches the raised exception and substitutes an alternative computation. This approach is exhaustive in on exceptions, ensuring all cases are addressed if the handler covers the relevant constructors.

Expression Interpreter Using Datatypes and Pattern Matching

Datatypes in Standard ML allow the definition of algebraic data types for representing structured data, such as abstract syntax trees for expressions, combined with pattern matching in functions for concise evaluation. A common practical scenario is an interpreter for simple arithmetic expressions, where a datatype encodes constants, , subtraction, and division. The eval function uses pattern matching to recursively compute the value, raising exceptions for invalid operations like via the underlying div primitive. Here is an example datatype and evaluator:

sml

datatype exp = Constant of int | Plus of exp * exp | Minus of exp * exp | Divide of exp * exp; fun eval (Constant n) = n | eval (Plus (e1, e2)) = eval e1 + eval e2 | eval (Minus (e1, e2)) = eval e1 - eval e2 | eval (Divide (e1, e2)) = eval e1 div eval e2;

datatype exp = Constant of int | Plus of exp * exp | Minus of exp * exp | Divide of exp * exp; fun eval (Constant n) = n | eval (Plus (e1, e2)) = eval e1 + eval e2 | eval (Minus (e1, e2)) = eval e1 - eval e2 | eval (Divide (e1, e2)) = eval e1 div eval e2;

For an expression like Divide (Constant 10, Constant 0), [eval](/page/Eval) raises Div, which can be caught externally with [handle](/page/Handle) for error propagation or recovery, such as raising a custom [EvalError](/page/Error) exception:

sml

exception EvalError of string; val result = eval (Divide (Constant 10, Constant 0)) handle Div => raise EvalError "Division by zero";

exception EvalError of string; val result = eval (Divide (Constant 10, Constant 0)) handle Div => raise EvalError "Division by zero";

This setup leverages matching's exhaustiveness checking at to ensure all expression variants are handled, while exceptions provide runtime .

Arbitrary-Precision Integers with the IntInf Module

The Basis Library includes the IntInf structure for arbitrary-precision integers, supporting operations beyond the fixed-size int type, such as , , and conversion from smaller types. This module is essential for computations involving where overflow is a concern, as IntInf values can grow dynamically without bounds (limited only by memory). Key operations include fromInt for conversion and the overloaded arithmetic operators like +. For example, to compute the factorial of a large number:

sml

local open IntInf in fun fact 0 = 1 | fact n = n * fact (n - 1) end; val largeFact = fact (fromInt 1000); (* Results in a very large IntInf integer *)

local open IntInf in fun fact 0 = 1 | fact n = n * fact (n - 1) end; val largeFact = fact (fromInt 1000); (* Results in a very large IntInf integer *)

Here, the local opening of IntInf overloads numeric literals to IntInf.int type and uses the module's arithmetic operators like *, enabling concise code without explicit fromInt conversions inside the function or qualified names. The argument to fact requires explicit conversion from int to IntInf, and * performs arbitrary-precision , ensuring no overflow occurs even for results exceeding word . This demonstrates IntInf's utility in numerical applications requiring exact arithmetic.

Generic Structures with Functors and Higher-Order Functions

Functors in Standard ML enable the creation of generic modules by parameterizing structures over signatures, facilitating reusable abstractions like sets. A can define higher-order functions that operate on the provided types, with allowing flexible instantiation. For example, a for a generic set uses a higher-order comparison function partially applied to elements. Consider a signature for ordered elements:

sml

signature ORD = sig type t val compare : t * t -> order end;

signature ORD = sig type t val compare : t * t -> order end;

A functor GenericSet takes an ORD structure and implements a set with insertion using a partially applied fold:

sml

functor GenericSet (Key : ORD) : sig type set val empty : set val insert : Key.t -> set -> set end = struct type set = Key.t list (* Simplified as list for illustration *) val empty = [] fun insert x s = if List.exists (fn y => Key.compare (x, y) = EQUAL) s then s else x :: s (* Higher-order List.exists with partial compare *) end;

functor GenericSet (Key : ORD) : sig type set val empty : set val insert : Key.t -> set -> set end = struct type set = Key.t list (* Simplified as list for illustration *) val empty = [] fun insert x s = if List.exists (fn y => Key.compare (x, y) = EQUAL) s then s else x :: s (* Higher-order List.exists with partial compare *) end;

Instantiating with integers (assuming an IntOrd structure providing compare):

sml

structure IntOrd = struct type t = int val compare = Int.compare end; structure IntSet = GenericSet (IntOrd); val s = IntSet.insert 42 IntSet.empty; (* Partial application in insert via fold-like ops *)

structure IntOrd = struct type t = int val compare = Int.compare end; structure IntSet = GenericSet (IntOrd); val s = IntSet.insert 42 IntSet.empty; (* Partial application in insert via fold-like ops *)

This example shows how functors parameterize over types and operations, with higher-order functions like List.exists (partially applied with Key.compare) enabling generic behavior. The Basis Library's List structure provides such combinators for list manipulation within functors.

Applications

Major Software Projects

Standard ML has been instrumental in developing several major software projects, particularly in formal verification and theorem proving, where its strong type system and functional paradigm support reliable, modular implementations. Poly/ML, a high-performance implementation of Standard ML, serves as the primary backend for the Isabelle theorem prover, enabling its concurrent proof development environment and integration with higher-order logic components. Isabelle's synchronization mechanisms rely on Poly/ML primitives adapted for parallel processing in proof scripts. Similarly, components of the HOL4 theorem prover leverage Standard ML implementations like Poly/ML for efficient execution of verified proofs in higher-order logic. The SML/NJ implementation underpins key development tools, including the MLWorks (IDE), which bootstraps from SML/NJ to provide interactive editing, debugging, and compilation for Standard ML programs. The CakeML project develops a verified for a subset of Standard ML, with its implementation and proofs conducted in HOL4, a theorem prover implemented in Standard ML. MLton, known for whole-program optimization, powers high-performance applications such as the sml-server web , which handles HTTP requests efficiently in production environments. These projects highlight Standard ML's impact on verified systems, including theorem provers and self-hosting compilers, where the language's formal semantics enable rigorous correctness guarantees without runtime errors.

Research and Educational Use

Standard ML has been widely adopted in academic settings for teaching functional programming principles due to its clean syntax, strong type system, and support for modular programming. Universities around the world have incorporated it as an introductory language for computer science courses, emphasizing its role in fostering understanding of higher-order functions, recursion, and type inference without the complexities of side effects. Moscow ML, a of Standard ML, is particularly suited for educational environments because of its and portability, making it ideal for exercises and student projects in . It supports the full Standard ML '97 specification while requiring minimal resources, allowing easy integration into teaching materials for topics like data structures and algorithm design. Key textbooks have further solidified Standard ML's pedagogical value. Larry Paulson's ML for the Working Programmer (2nd edition, 1996) provides a comprehensive guide to programming in Standard ML, covering core concepts such as , modules, and through practical examples and exercises, and it remains a staple in university curricula for its balance of theory and application. In research, Standard ML has advanced by serving as a foundation for exploring polymorphic and modular type systems, with seminal work including type-theoretic interpretations that embed its semantics into richer logical frameworks. This has enabled deeper investigations into the of programming languages, influencing developments in dependent types and proof assistants. A prominent example is the CakeML project, which develops a verified implementation of a substantial subset of Standard ML, including its parser, type checker, and compiler, all proven correct in the HOL4 theorem prover. This work demonstrates Standard ML's utility in , allowing researchers to build and certify trustworthy software systems while highlighting the language's expressive power for mechanized proofs. The ML Family Workshop series continues to foster research on Standard ML and its extensions. The edition, held in conjunction with ICFP, featured discussions on enhancements and within the ML ecosystem, while the 2025 workshop, held on October 16 in , featured discussions on enhancements such as freezing bidirectional typing, via projects like LunarML, and language extensions including implicit modules and extended . These events promote collaboration on theoretical advancements and practical innovations in the extended ML family. Standard ML's adoption extends to functional programming curricula at institutions like , where Robert Harper's Programming in Standard ML supports courses on programming language foundations. Its influence is also evident in the design of , which inherits Standard ML's core and module concepts, adapting them for broader applicability while building on the standardization efforts of SML.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.