Recent from talks
Nothing was collected or created yet.
Type system
View on WikipediaThis article is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic. (July 2016) |
| Type systems |
|---|
| General concepts |
| Major categories |
|
| Minor categories |
A programming language consists of a system of allowed sequences of symbols (constructs) together with rules that define how each construct is interpreted. For example, a language might allow expressions representing various types of data, expressions that provide structuring rules for data, expressions representing various operations on data, and constructs that provide sequencing rules for the order in which to perform operations.
A simple type system for a programming language is a set of rules that associates a data type (for example, integer, floating point, string) with each term (data-valued expression) in a computer program. In more ambitious type systems, a variety of constructs, such as variables, expressions, functions, and modules, may be assigned types.[1]
Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other data types, such as "string", "array of float", "function returning boolean".
The main purpose of a type system in a programming language is to reduce possibilities for bugs in computer programs due to mismatches in how values are interpreted in different parts of a program. The aim is to prevent operations expecting a certain kind of value from being applied to values for which that operation does not make sense (validity errors).[2] A type system can detect and prevent some of these mismatches. When a type mismatch is detected, it is called a type error.
The type of a term constrains the contexts in which it may be used. For a variable, the type system determines the allowed values of that variable. For that variable be presented as a parameter to an operation, the operation must be able to accept in that parameter any value that the type of the variable allows.
Type systems are typically specified as part of programming language design. They are built into interpreters and compilers for the language. In some languages, the type system can be extended by optional tools that perform added checks using the language's original type syntax and grammar.
Type systems allow defining interfaces between different parts of a computer program, and then checking that the parts have been connected in a consistent way. This checking can happen statically (at compile time), dynamically (at run time), or as a combination of both.
Type systems have other purposes as well, such as expressing business rules, enabling certain compiler optimizations, allowing for multiple dispatch, and providing a form of documentation.
Usage overview
[edit]An example of a simple type system is that of the C language. The portions of a C program are the function definitions. One function is invoked by another function.
The interface of a function states the name of the function and a list of parameters that are passed to the function's code. The code of an invoking function states the name of the invoked, along with the names of variables that hold values to pass to it.
During a computer program's execution, the values are placed into temporary storage, then execution jumps to the code of the invoked function. The invoked function's code accesses the values and makes use of them.
If the instructions inside the function are written with the assumption of receiving an integer value, but the calling code passed a floating-point value, then the wrong result will be computed by the invoked function.
The C compiler checks the types of the arguments passed to a function when it is called against the types of the parameters declared in the function's definition. If the types do not match, the compiler throws a compile-time error or warning.
A compiler may also use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. In many C compilers the float data type, for example, is represented in 32 bits, in accord with the IEEE specification for single-precision floating point numbers. They will thus use floating-point-specific microprocessor operations on those values (floating-point addition, multiplication, etc.).
The depth of type constraints and the manner of their evaluation affect the typing of the language. A programming language may further associate an operation with various resolutions for each type, in the case of type polymorphism. Type theory is the study of type systems. The concrete types of some programming languages, such as integers and strings, depend on practical issues of computer architecture, compiler implementation, and language design.
Fundamentals
[edit]Formally, type theory studies type systems. A programming language must have the opportunity to type check using the type system whether at compile time or runtime, manually annotated or automatically inferred. As Mark Manasse concisely put it:[3]
The fundamental problem addressed by a type theory is to ensure that programs have meaning. The fundamental problem caused by a type theory is that meaningful programs may not have meanings ascribed to them. The quest for richer type systems results from this tension.
Assigning a data type, termed typing, gives meaning to a sequence of bits such as a value in memory or some object such as a variable. The hardware of a general purpose computer is unable to discriminate between for example a memory address and an instruction code, or between a character, an integer, or a floating-point number, because it makes no intrinsic distinction between any of the possible values that a sequence of bits might mean.[note 1] Associating a sequence of bits with a type conveys that meaning to the programmable hardware to form a symbolic system composed of that hardware and some program.
A program associates each value with at least one specific type, but it also can occur that one value is associated with many subtypes. Other entities, such as objects, modules, communication channels, and dependencies can become associated with a type. Even a type can become associated with a type. An implementation of a type system could in theory associate identifications called data type (a type of a value), class (a type of an object), and kind (a type of a type, or metatype). These are the abstractions that typing can go through, on a hierarchy of levels contained in a system.
When a programming language evolves a more elaborate type system, it gains a more finely grained rule set than basic type checking, but this comes at a price when the type inferences (and other properties) become undecidable, and when more attention must be paid by the programmer to annotate code or to consider computer-related operations and functioning. It is challenging to find a sufficiently expressive type system that satisfies all programming practices in a type safe manner.
A programming language compiler can also implement a dependent type or an effect system, which enables even more program specifications to be verified by a type checker. Beyond simple value-type pairs, a virtual "region" of code is associated with an "effect" component describing what is being done with what, and enabling for example to "throw" an error report. Thus the symbolic system may be a type and effect system, which endows it with more safety checking than type checking alone.
Whether automated by the compiler or specified by a programmer, a type system renders program behavior illegal if it falls outside the type-system rules. Advantages provided by programmer-specified type systems include:
- Abstraction (or modularity) – Types enable programmers to think at a higher level than the bit or byte, not bothering with low-level implementation. For example, programmers can begin to think of a string as a set of character values instead of as an array of bytes. Higher still, types enable programmers to think about and express interfaces between two of any-sized subsystems. This enables more levels of localization so that the definitions required for interoperability of the subsystems remain consistent when those two subsystems communicate.
- Documentation – In more expressive type systems, types can serve as a form of documentation clarifying the intent of the programmer. For example, if a programmer declares a function as returning a timestamp type, this documents the function when the timestamp type can be explicitly declared deeper in the code to be an integer type.
Advantages provided by compiler-specified type systems include:
- Optimization – Static type-checking may provide useful compile-time information. For example, if a type requires that a value must align in memory at a multiple of four bytes, the compiler may be able to use more efficient machine instructions.
- Safety – A type system enables the compiler to detect meaningless or invalid code. For example, we can identify an expression
3 / "Hello, World"as invalid, when the rules do not specify how to divide an integer by a string. Strong typing offers more safety, but cannot guarantee complete type safety.
Type errors
[edit]A type error occurs when an operation receives a different type of data than it expected.[4] For example, a type error would happen if a line of code divides two integers, and is passed a string of letters instead of an integer.[4] It is an unintended condition[note 2] which might manifest in multiple stages of a program's development. Thus a facility for detection of the error is needed in the type system. In some languages, such as Haskell, for which type inference is automated, lint might be available to its compiler to aid in the detection of error.
Type safety contributes to program correctness, but might only guarantee correctness at the cost of making the type checking itself an undecidable problem (as in the halting problem). In a type system with automated type checking, a program may prove to run incorrectly yet produce no compiler errors. Division by zero is an unsafe and incorrect operation, but a type checker which only runs at compile time does not scan for division by zero in most languages; that division would surface as a runtime error. To prove the absence of these defects, other kinds of formal methods, collectively known as program analyses, are in common use. Alternatively, a sufficiently expressive type system, such as in dependently typed languages, can prevent these kinds of errors (for example, expressing the type of non-zero numbers). In addition, software testing is an empirical method for finding errors that such a type checker would not detect.
Type checking
[edit]The process of verifying and enforcing the constraints of types—type checking—may occur at compile time (a static check) or at run-time (a dynamic check).
If a language specification requires its typing rules strongly, more or less allowing only those automatic type conversions that do not lose information, one can refer to the process as strongly typed; if not, as weakly typed.
The terms are not usually used in a strict sense.
Static type checking
[edit]Static type checking is the process of verifying the type safety of a program based on analysis of a program's text (source code). If a program passes a static type checker, then the program is guaranteed to satisfy some set of type safety properties for all possible inputs.
Static type checking can be considered a limited form of program verification (see type safety), and in a type-safe language, can also be considered an optimization. If a compiler can prove that a program is well-typed, then it does not need to emit dynamic safety checks, allowing the resulting compiled binary to run faster and to be smaller.
Static type checking for Turing-complete languages is inherently conservative. That is, if a type system is both sound (meaning that it rejects all incorrect programs) and decidable (meaning that it is possible to write an algorithm that determines whether a program is well-typed), then it must be incomplete (meaning there are correct programs, which are also rejected, even though they do not encounter runtime errors).[7] For example, consider a program containing the code:
if <complex test> then <do something> else <signal that there is a type error>
Even if the expression <complex test> always evaluates to true at run-time, most type checkers will reject the program as ill-typed, because it is difficult (if not impossible) for a static analyzer to determine that the else branch will not be taken.[8] Consequently, a static type checker will quickly detect type errors in rarely used code paths. Without static type checking, even code coverage tests with 100% coverage may be unable to find such type errors. The tests may fail to detect such type errors, because the combination of all places where values are created and all places where a certain value is used must be taken into account.
A number of useful and common programming language features cannot be checked statically, such as downcasting. Thus, many languages will have both static and dynamic type checking; the static type checker verifies what it can, and dynamic checks verify the rest.
Many languages with static type checking provide a way to bypass the type checker. Some languages allow programmers to choose between static and dynamic type safety. For example, historically C# declares variables statically,[9]: 77, Section 3.2 but C# 4.0 introduces the dynamic keyword, which is used to declare variables to be checked dynamically at runtime.[9]: 117, Section 4.1 Other languages allow writing code that is not type-safe; for example, in C, programmers can freely cast a value between any two types that have the same size, effectively subverting the type concept.
Dynamic type checking and runtime type information
[edit]Dynamic type checking is the process of verifying the type safety of a program at runtime. Implementations of dynamically type-checked languages generally associate each runtime object with a type tag (i.e., a reference to a type) containing its type information. This runtime type information (RTTI) can also be used to implement dynamic dispatch, late binding, downcasting, reflective programming (reflection), and similar features.
Most type-safe languages include some form of dynamic type checking, even if they also have a static type checker.[10] The reason for this is that many useful features or properties are difficult or impossible to verify statically. For example, suppose that a program defines two types, A and B, where B is a subtype of A. If the program tries to convert a value of type A to type B, which is known as downcasting, then the operation is legal only if the value being converted is actually a value of type B. Thus, a dynamic check is needed to verify that the operation is safe. This requirement is one of the criticisms of downcasting.
By definition, dynamic type checking may cause a program to fail at runtime. In some programming languages, it is possible to anticipate and recover from these failures. In others, type-checking errors are considered fatal.
Programming languages that include dynamic type checking but not static type checking are often called "dynamically typed programming languages".
Combining static and dynamic type checking
[edit]Certain languages allow both static and dynamic typing. For example, Java and some other ostensibly statically typed languages support downcasting types to their subtypes, querying an object to discover its dynamic type and other type operations that depend on runtime type information. Another example is C++ RTTI. More generally, most programming languages include mechanisms for dispatching over different 'kinds' of data, such as disjoint unions, runtime polymorphism, and variant types. Even when not interacting with type annotations or type checking, such mechanisms are materially similar to dynamic typing implementations.
Objects in object-oriented languages are usually accessed by a reference whose static target type (or manifest type) is equal to either the object's run-time type (its latent type) or a supertype thereof. This is conformant with the Liskov substitution principle, which states that all operations performed on an instance of a given type can also be performed on an instance of a subtype. This concept is also known as subsumption or subtype polymorphism. In some languages subtypes may also possess covariant or contravariant return types and argument types respectively.
Certain languages, for example Clojure, Common Lisp, or Cython are dynamically type checked by default, but allow programs to opt into static type checking by providing optional annotations. One reason to use such hints would be to optimize the performance of critical sections of a program. This is formalized by gradual typing. The programming environment DrRacket, a pedagogic environment based on Lisp, and a precursor of the language Racket is also soft-typed.[11]
Conversely, as of version 4.0, the C# language provides a way to indicate that a variable should not be statically type checked. A variable whose type is dynamic will not be subject to static type checking. Instead, the program relies on runtime type information to determine how the variable may be used.[12][9]: 113–119
In Rust, the dyn std::any::Any type provides dynamic typing of 'static types.[13]
Static and dynamic type checking in practice
[edit]The choice between static and dynamic typing requires certain trade-offs.
Static typing can find type errors reliably at compile time, which increases the reliability of the delivered program. However, programmers disagree over how commonly type errors occur, resulting in further disagreements over the proportion of those bugs that are coded that would be caught by appropriately representing the designed types in code.[14][15] Static typing advocates[who?] believe programs are more reliable when they have been well type-checked, whereas dynamic-typing advocates[who?] point to distributed code that has proven reliable and to small bug databases.[citation needed] The value of static typing increases as the strength of the type system is increased. Advocates of dependent typing,[who?] implemented in languages such as Dependent ML and Epigram, have suggested that almost all bugs can be considered type errors, if the types used in a program are properly declared by the programmer or correctly inferred by the compiler.[16]
Static typing usually results in compiled code that executes faster. When the compiler knows the exact data types that are in use (which is necessary for static verification, either through declaration or inference) it can produce optimized machine code. Some dynamically typed languages such as Common Lisp allow optional type declarations for optimization for this reason.
By contrast, dynamic typing may allow compilers to run faster and interpreters to dynamically load new code, because changes to source code in dynamically typed languages may result in less checking to perform and less code to revisit.[clarification needed] This too may reduce the edit-compile-test-debug cycle.
Statically typed languages that lack type inference (such as C and Java prior to version 10) require that programmers declare the types that a method or function must use. This can serve as added program documentation, that is active and dynamic, instead of static. This allows a compiler to prevent it from drifting out of synchrony, and from being ignored by programmers. However, a language can be statically typed without requiring type declarations (examples include Haskell, Scala, OCaml, F#, Swift, and to a lesser extent C# and C++), so explicit type declaration is not a necessary requirement for static typing in all languages.
Dynamic typing allows constructs that some (simple) static type checking would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible. An eval function is possible with static typing, but requires advanced uses of algebraic data types. Further, dynamic typing better accommodates transitional code and prototyping, such as allowing a placeholder data structure (mock object) to be transparently used in place of a full data structure (usually for the purposes of experimentation and testing).
Dynamic typing typically allows duck typing (which enables easier code reuse). Many[specify] languages with static typing also feature duck typing or other mechanisms like generic programming that also enable easier code reuse.
Dynamic typing typically makes metaprogramming easier to use. For example, C++ templates are typically more cumbersome to write than the equivalent Ruby or Python code since C++ has stronger rules regarding type definitions (for both functions and variables). This forces a developer to write more boilerplate code for a template than a Python developer would need to. More advanced run-time constructs such as metaclasses and introspection are often harder to use in statically typed languages. In some languages, such features may also be used e.g. to generate new types and behaviors on the fly, based on run-time data. Such advanced constructs are often provided by dynamic programming languages; many of these are dynamically typed, although dynamic typing need not be related to dynamic programming languages.
Strong and weak type systems
[edit]Languages are often colloquially referred to as strongly typed or weakly typed. In fact, there is no universally accepted definition of what these terms mean. In general, there are more precise terms to represent the differences between type systems that lead people to call them "strong" or "weak".
Type safety and memory safety
[edit]A third way of categorizing the type system of a programming language is by the safety of typed operations and conversions. Computer scientists use the term type-safe language to describe languages that do not allow operations or conversions that violate the rules of the type system.
Computer scientists use the term memory-safe language (or just safe language) to describe languages that do not allow programs to access memory that has not been assigned for their use. For example, a memory-safe language will check array bounds, or else statically guarantee (i.e., at compile time before execution) that array accesses out of the array boundaries will cause compile-time and perhaps runtime errors.
Consider the following program of a language that is both type-safe and memory-safe:[17]
var x := 5;
var y := "37";
var z := x + y;
In this example, the variable z will have the value 42. Although this may not be what the programmer anticipated, it is a well-defined result. If y were a different string, one that could not be converted to a number (e.g. "Hello World"), the result would be well-defined as well. Note that a program can be type-safe or memory-safe and still crash on an invalid operation. This is for languages where the type system is not sufficiently advanced to precisely specify the validity of operations on all possible operands. But if a program encounters an operation that is not type-safe, terminating the program is often the only option.
Now consider a similar example in C:
int x = 5;
char y[] = "37";
char* z = x + y;
printf("%c\n", *z);
In this example z will point to a memory address five characters beyond y, equivalent to three characters after the terminating zero character of the string pointed to by y. This is memory that the program is not expected to access. In C terms this is simply undefined behaviour and the program may do anything; with a simple compiler it might actually print whatever byte is stored after the string "37". As this example shows, C is not memory-safe. As arbitrary data was assumed to be a character, it is also not a type-safe language.
In general, type-safety and memory-safety go hand in hand. For example, a language that supports pointer arithmetic and number-to-pointer conversions (like C) is neither memory-safe nor type-safe, because it allows arbitrary memory to be accessed as if it were valid memory of any type.
Variable levels of type checking
[edit]Some languages allow different levels of checking to apply to different regions of code. Examples include:
- The
use strictdirective in JavaScript[18][19][20] and Perl applies stronger checking. - The
declare(strict_types=1)in PHP[21] on a per-file basis allows only a variable of exact type of the type declaration will be accepted, or aTypeErrorwill be thrown. - The
Option Strict Onin VB.NET allows the compiler to require a conversion between objects.
Additional tools such as lint and IBM Rational Purify can also be used to achieve a higher level of strictness.
Optional type systems
[edit]It has been proposed, chiefly by Gilad Bracha, that the choice of type system be made independent of choice of language; that a type system should be a module that can be plugged into a language as needed. He believes this is advantageous, because what he calls mandatory type systems make languages less expressive and code more fragile.[22] The requirement that the type system does not affect the semantics of the language is difficult to fulfill.
Optional typing is related to, but distinct from, gradual typing. While both typing disciplines can be used to perform static analysis of code (static typing), optional type systems do not enforce type safety at runtime (dynamic typing).[22][23]
Polymorphism and types
[edit]The term polymorphism refers to the ability of code (especially, functions or classes) to act on values of multiple types, or to the ability of different instances of the same data structure to contain elements of different types. Type systems that allow polymorphism generally do so in order to improve the potential for code re-use: in a language with polymorphism, programmers need only implement a data structure such as a list or an associative array once, rather than once for each type of element with which they plan to use it. For this reason computer scientists sometimes call the use of certain forms of polymorphism generic programming. The type-theoretic foundations of polymorphism are closely related to those of abstraction, modularity and (in some cases) subtyping.
Specialized type systems
[edit]Many type systems have been created that are specialized for use in certain environments with certain types of data, or for out-of-band static program analysis. Frequently, these are based on ideas from formal type theory and are only available as part of prototype research systems.
The following table gives an overview over type theoretic concepts that are used in specialized type systems. The names M, N, O range over terms and the names range over types. The following notation will be used:
- means that has type ;
- is that application of on ;
- (resp. ) describes the type which results from replacing all occurrences of the type variable α (resp. term variable x) in by the type σ (resp. term N).
| Type notion | Notation | Meaning |
|---|---|---|
| Function | If and , then . | |
| Product | If , then is a pair s.t. and . | |
| Sum | If , then is the first injection s.t. , or is the second injection s.t. . | |
| Intersection | If , then and . | |
| Union | If , then or . | |
| Record | If , then M has a member . | |
| Polymorphic | If , then for any type σ. | |
| Existential | If , then for some type σ. | |
| Recursive | If , then . | |
| Dependent function[a] | If and , then . | |
| Dependent pair[b] | If , then is a pair s.t. and . | |
| Dependent intersection[24] | If , then and . | |
| Familial intersection[24] | If , then for any term . | |
| Familial union[24] | If , then for some term . |
Dependent types
[edit]Dependent types are based on the idea of using scalars or values to more precisely describe the type of some other value. For example, might be the type of a matrix. We can then define typing rules such as the following rule for matrix multiplication:
where k, m, n are arbitrary positive integer values. A variant of ML called Dependent ML has been created based on this type system, but because type checking for conventional dependent types is undecidable, not all programs using them can be type-checked without some kind of limits. Dependent ML limits the sort of equality it can decide to Presburger arithmetic.
Other languages such as Epigram make the value of all expressions in the language decidable so that type checking can be decidable. However, in general proof of decidability is undecidable, so many programs require hand-written annotations that may be very non-trivial. As this impedes the development process, many language implementations provide an easy way out in the form of an option to disable this condition. This, however, comes at the cost of making the type-checker run in an infinite loop when fed programs that do not type-check, causing the compilation to fail.
Linear types
[edit]Linear types, based on the theory of linear logic, and closely related to uniqueness types, are types assigned to values having the property that they have one and only one reference to them at all times. These are valuable for describing large immutable values such as files, strings, and so on, because any operation that simultaneously destroys a linear object and creates a similar object (such as str = str + "a") can be optimized "under the hood" into an in-place mutation. Normally this is not possible, as such mutations could cause side effects on parts of the program holding other references to the object, violating referential transparency. They are also used in the prototype operating system Singularity for interprocess communication, statically ensuring that processes cannot share objects in shared memory in order to prevent race conditions. The Clean language (a Haskell-like language) uses this type system in order to gain a lot of speed (compared to performing a deep copy) while remaining safe.
Intersection types
[edit]Intersection types are types describing values that belong to both of two other given types with overlapping value sets. For example, in most implementations of C the signed char has range -128 to 127 and the unsigned char has range 0 to 255, so the intersection type of these two types would have range 0 to 127. Such an intersection type could be safely passed into functions expecting either signed or unsigned chars, because it is compatible with both types.
Intersection types are useful for describing overloaded function types: for example, if "int → int" is the type of functions taking an integer argument and returning an integer, and "float → float" is the type of functions taking a float argument and returning a float, then the intersection of these two types can be used to describe functions that do one or the other, based on what type of input they are given. Such a function could be passed into another function expecting an "int → int" function safely; it simply would not use the "float → float" functionality.
In a subclassing hierarchy, the intersection of a type and an ancestor type (such as its parent) is the most derived type. The intersection of sibling types is empty.
The Forsythe language includes a general implementation of intersection types. A restricted form is refinement types.
Union types
[edit]Union types are types describing values that belong to either of two types. For example, in C, the signed char has a -128 to 127 range, and the unsigned char has a 0 to 255 range, so the union of these two types would have an overall "virtual" range of -128 to 255 that may be used partially depending on which union member is accessed. Any function handling this union type would have to deal with integers in this complete range. More generally, the only valid operations on a union type are operations that are valid on both types being unioned. C's "union" concept is similar to union types, but is not typesafe, as it permits operations that are valid on either type, rather than both. Union types are important in program analysis, where they are used to represent symbolic values whose exact nature (e.g., value or type) is not known.
In a subclassing hierarchy, the union of a type and an ancestor type (such as its parent) is the ancestor type. The union of sibling types is a subtype of their common ancestor (that is, all operations permitted on their common ancestor are permitted on the union type, but they may also have other valid operations in common).
Existential types
[edit]Existential types are frequently used in connection with record types to represent modules and abstract data types, due to their ability to separate implementation from interface. For example, the type "T = ∃X { a: X; f: (X → int); }" describes a module interface that has a data member named a of type X and a function named f that takes a parameter of the same type X and returns an integer. This could be implemented in different ways; for example:
- intT = { a: int; f: (int → int); }
- floatT = { a: float; f: (float → int); }
These types are both subtypes of the more general existential type T and correspond to concrete implementation types, so any value of one of these types is a value of type T. Given a value "t" of type "T", we know that "t.f(t.a)" is well-typed, regardless of what the abstract type X is. This gives flexibility for choosing types suited to a particular implementation, while clients that use only values of the interface type—the existential type—are isolated from these choices.
In general it's impossible for the typechecker to infer which existential type a given module belongs to. In the above example intT { a: int; f: (int → int); } could also have the type ∃X { a: X; f: (X → int); }. The simplest solution is to annotate every module with its intended type, e.g.:
- intT = { a: int; f: (int → int); } as ∃X { a: X; f: (X → int); }
Although abstract data types and modules had been implemented in programming languages for quite some time, it wasn't until 1988 that John C. Mitchell and Gordon Plotkin established the formal theory under the slogan: "Abstract [data] types have existential type".[25] The theory is a second-order typed lambda calculus similar to System F, but with existential instead of universal quantification.
Gradual typing
[edit]In a type system with Gradual typing, variables may be assigned a type either at compile-time (which is static typing), or at run-time (which is dynamic typing).[26] This allows software developers to choose either type paradigm as appropriate, from within a single language.[26] Gradual typing uses a special type named dynamic to represent statically unknown types; gradual typing replaces the notion of type equality with a new relation called consistency that relates the dynamic type to every other type. The consistency relation is symmetric but not transitive.[27]
Explicit or implicit declaration and inference
[edit]Many static type systems, such as those of C and Java, require type declarations: the programmer must explicitly associate each variable with a specific type. Others, such as Haskell's, perform type inference: the compiler draws conclusions about the types of variables based on how programmers use those variables. For example, given a function f(x, y) that adds x and y together, the compiler can infer that x and y must be numbers—since addition is only defined for numbers. Thus, any call to f elsewhere in the program that specifies a non-numeric type (such as a string or list) as an argument would signal an error.
Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression 3.14 might imply a type of floating-point, while [1, 2, 3] might imply a list of integers—typically an array.
Type inference is in general possible, if it is computable in the type system in question. Moreover, even if inference is not computable in general for a given type system, inference is often possible for a large subset of real-world programs. Haskell's type system, a version of Hindley–Milner, is a restriction of System Fω to so-called rank-1 polymorphic types, in which type inference is computable. Most Haskell compilers allow arbitrary-rank polymorphism as an extension, but this makes type inference not computable. (Type checking is decidable, however, and rank-1 programs still have type inference; higher rank polymorphic programs are rejected unless given explicit type annotations.)
Decision problems
[edit]A type system that assigns types to terms in type environments using typing rules is naturally associated with the decision problems of type checking, typability, and type inhabitation.[28]
- Given a type environment , a term , and a type , decide whether the term can be assigned the type in the type environment.
- Given a term , decide whether there exists a type environment and a type such that the term can be assigned the type in the type environment .
- Given a type environment and a type , decide whether there exists a term that can be assigned the type in the type environment.
Unified type system
[edit]Some languages like C# or Scala have a unified type system.[29] This means that all C# types including primitive types inherit from a single root object. Every type in C# inherits from the Object class. Some languages, like Java and Raku, have a root type but also have primitive types that are not objects.[30] Java provides wrapper object types that exist together with the primitive types so developers can use either the wrapper object types or the simpler non-object primitive types. Raku automatically converts primitive types to objects when their methods are accessed.[31]
Compatibility: equivalence and subtyping
[edit]A type checker for a statically typed language must verify that the type of any expression is consistent with the type expected by the context in which that expression appears. For example, in an assignment statement of the form x := e,
the inferred type of the expression e must be consistent with the declared or inferred type of the variable x. This notion of consistency, called compatibility, is specific to each programming language.
If the type of e and the type of x are the same, and assignment is allowed for that type, then this is a valid expression. Thus, in the simplest type systems, the question of whether two types are compatible reduces to that of whether they are equal (or equivalent). Different languages, however, have different criteria for when two type expressions are understood to denote the same type. These different equational theories of types vary widely, two extreme cases being structural type systems, in which any two types that describe values with the same structure are equivalent, and nominative type systems, in which no two syntactically distinct type expressions denote the same type (i.e., types must have the same "name" in order to be equal).
In languages with subtyping, the compatibility relation is more complex: If B is a subtype of A, then a value of type B can be used in a context where one of type A is expected (covariant), even if the reverse is not true. Like equivalence, the subtype relation is defined differently for each programming language, with many variations possible. The presence of parametric or ad hoc polymorphism in a language may also have implications for type compatibility.
See also
[edit]Notes
[edit]- ^ The Burroughs ALGOL computer line determined a memory location's contents by its flag bits. Flag bits specify the contents of a memory location. Instruction, data type, and functions are specified by a 3 bit code in addition to its 48 bit contents. Only the MCP (Master Control Program) could write to the flag code bits.
- ^ For example, a leaky abstraction might surface during development, which may show that more type development is needed. —"The evaluation of a well-typed program always terminates".—B. Nordström, K. Petersson, and J. M. Smith[5] A systematic change in variables to avoid capture of a free variable can introduce error, in a functional programming language where functions are first class citizens.[6] —From the lambda calculus article.
References
[edit]- ^ Pierce 2002, p. 1: "A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute."
- ^ Cardelli 2004, p. 1: "The fundamental purpose of a type system is to prevent the occurrence of execution errors during the running of a program."
- ^ Pierce 2002, p. 208.
- ^ a b Sethi, R. (1996). Programming languages: Concepts and constructs (2nd ed.). Addison-Wesley. p. 142. ISBN 978-0-201-59065-4. OCLC 604732680.
- ^ Nordström, B.; Petersson, K.; Smith, J.M. (2001). "Martin-Löf's Type Theory". Algebraic and Logical Structures. Handbook of Logic in Computer Science. Vol. 5. Oxford University Press. p. 2. ISBN 978-0-19-154627-3.
- ^ Turner, D.A. (12 June 2012). "Some History of Functional Programming Languages" (PDF). invited lecture at TFP12, at St Andrews University. See the section on Algol 60.
- ^ "... any sound, decidable type system must be incomplete" —D. Remy (2017). p. 29, Remy, Didier. "Type systems for programming languages" (PDF). Archived from the original (PDF) on 14 November 2017. Retrieved 26 May 2013.
- ^ Pierce 2002.
- ^ a b c Skeet, Jon (2019). C# in Depth (4 ed.). Manning. ISBN 978-1617294532.
- ^ Miglani, Gaurav (2018). "Dynamic Method Dispatch or Runtime Polymorphism in Java". Archived from the original on 2020-12-07. Retrieved 2021-03-28.
- ^ Wright, Andrew K. (1995). Practical Soft Typing (PhD). Rice University. hdl:1911/16900.
- ^ "dynamic (C# Reference)". MSDN Library. Microsoft. Retrieved 14 January 2014.
- ^ "std::any — Rust". doc.rust-lang.org. Retrieved 2021-07-07.
- ^ Meijer, Erik; Drayton, Peter. "Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages" (PDF). Microsoft Corporation.
- ^ Laucher, Amanda; Snively, Paul (2012). "Types vs Tests". InfoQ.
- ^
Xi, Hongwei (1998). Dependent Types in Practical Programming (PhD). Department of Mathematical Sciences, Carnegie Mellon University. CiteSeerX 10.1.1.41.548.
Xi, Hongwei; Pfenning, Frank (1999). "Dependent Types in Practical Programming". Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM. pp. 214–227. CiteSeerX 10.1.1.69.2042. doi:10.1145/292540.292560. ISBN 1581130953. S2CID 245490. - ^ Visual Basic is an example of a language that is both type-safe and memory-safe.
- ^ "4.2.2 The Strict Variant of ECMAScript". ECMAScript® 2020 Language Specification (11th ed.). ECMA. June 2020. ECMA-262.
- ^ "Strict mode – JavaScript". MDN. Developer.mozilla.org. 2013-07-03. Retrieved 2013-07-17.
- ^ "Strict Mode (JavaScript)". MSDN. Microsoft. Retrieved 2013-07-17.
- ^ "Strict typing". PHP Manual: Language Reference: Functions.
- ^ a b Bracha, G. "Pluggable Types" (PDF).
- ^ "Sure. It's called "gradual typing", and I would qualify it as trendy. ..." Is there a language that allows both static and dynamic typing?. stackoverflow. 2012.
- ^ a b c Kopylov, Alexei (2003). "Dependent intersection: A new way of defining records in type theory". 18th IEEE Symposium on Logic in Computer Science. LICS 2003. IEEE Computer Society. pp. 86–95. CiteSeerX 10.1.1.89.4223. doi:10.1109/LICS.2003.1210048.
- ^ Mitchell, John C.; Plotkin, Gordon D. (July 1988). "Abstract Types Have Existential Type" (PDF). ACM Trans. Program. Lang. Syst. 10 (3): 470–502. doi:10.1145/44501.45065. S2CID 1222153.
- ^ a b Siek, Jeremy (24 March 2014). "What is gradual typing?".
- ^ Siek, Jeremy; Taha, Walid (September 2006). Gradual Typing for Functional Languages (PDF). Scheme and Functional Programming 2006. University of Chicago. pp. 81–92.
- ^ Barendregt, Henk; Dekkers, Wil; Statman, Richard (20 June 2013). Lambda Calculus with Types. Cambridge University Press. p. 66. ISBN 978-0-521-76614-2.
- ^ "8.2.4 Type system unification". C# Language Specification (5th ed.). ECMA. December 2017. ECMA-334.
- ^ "Native Types". Perl 6 Documentation.
- ^ "Numerics, § Auto-boxing". Perl 6 Documentation.
Further reading
[edit]- Cardelli, Luca; Wegner, Peter (December 1985). "On Understanding Types, Data Abstraction, and Polymorphism" (PDF). ACM Computing Surveys. 17 (4): 471–523. CiteSeerX 10.1.1.117.695. doi:10.1145/6041.6042. S2CID 2921816.
- Pierce, Benjamin C. (2002). Types and Programming Languages. MIT Press. ISBN 978-0-262-16209-8.
- Cardelli, Luca (2004). "Type systems" (PDF). In Allen B. Tucker (ed.). CRC Handbook of Computer Science and Engineering (2nd ed.). CRC Press. ISBN 978-1584883609.
- Tratt, Laurence (July 2009). "5. Dynamically Typed Languages". Advances in Computers. Vol. 77. Elsevier. pp. 149–184. doi:10.1016/S0065-2458(09)01205-4. ISBN 978-0-12-374812-6.
External links
[edit]
Media related to Type systems at Wikimedia Commons- Smith, Chris (2011). "What to Know Before Debating Type Systems".
Type system
View on GrokipediaOverview and Fundamentals
Definition and Purpose
A type system in programming languages is a collection of rules that assigns types to various syntactic constructs, such as variables, expressions, and functions, to classify the values they can hold and the operations applicable to them.[6] This framework enforces constraints to prevent type errors, where incompatible operations are applied to values, such as adding a string to an integer.[7] By associating types with program elements, the system ensures that computations remain well-defined and avoids undefined behaviors like interpreting memory incorrectly.[8] The primary purpose of a type system is to promote program correctness by detecting and ruling out potential errors early, often at compile time, thereby reducing the risk of runtime failures and enhancing overall reliability.[3] It achieves this through type checking, which verifies compliance with typing rules before execution.[7] Additionally, type systems enable code optimization by providing compilers with precise knowledge of data representations, allowing for efficient memory allocation, specialized code generation, and the elimination of unnecessary runtime checks.[3] They also support abstraction by defining interfaces and modular structures that hide implementation details while ensuring safe interactions between components.[9] In practice, type systems distinguish between primitive types, such as integers (int) for numerical values and booleans (bool) for true/false states, and composite types like arrays for collections of elements or classes for object-oriented structures.[7] These classifications guide memory allocation by determining storage requirements— for instance, an int typically reserves a fixed number of bytes— and data representation by specifying how bit patterns are interpreted to avoid misuses like treating a function pointer as an integer.[8] Overall, the benefits include improved program reliability through error prevention, performance gains from informed optimizations, and the facilitation of generic programming via type-based parameterization for reusable code.[3]Historical Context
The theoretical foundations of type systems trace back to Alonzo Church's development of lambda calculus in the 1930s, which provided a formal model for computation and function abstraction, initially without types but later extended to include typed variants for ensuring well-formed expressions. Category theory, emerging in the 1940s through the work of Samuel Eilenberg and Saunders Mac Lane, further influenced type systems by offering abstract frameworks for understanding structures like functors and monads, which underpin modern polymorphic and higher-kinded types in programming languages.[10] Early practical type systems appeared in the 1950s with Fortran, designed primarily for numerical computation on IBM machines, where variables were implicitly typed based on their names—starting with 'I' through 'N' for integers (fixed-point) and others for floating-point—to optimize scientific calculations without explicit type declarations.[11] This was followed by ALGOL 58 in 1958, which introduced more structured data handling with explicit types including integers, reals, Booleans, and arrays, aiming for a universal algebraic language to facilitate portable algorithmic descriptions across machines.[12] Key milestones in the 1970s advanced type safety and expressiveness: Pascal, released in 1970 by Niklaus Wirth, pioneered strong static typing with features like user-defined types, records, and strict type compatibility to promote structured programming and error prevention in educational and systems contexts.[13] Shortly after, ML in 1973, developed by Robin Milner at the University of Edinburgh, introduced parametric polymorphism via the Hindley-Milner type inference system, enabling generic functions without explicit annotations while maintaining type safety.[14] The 1980s saw further innovation with Coq, initiated in 1984 by Thierry Coquand and Gérard Huet at INRIA, incorporating dependent types in the Calculus of Constructions to link types directly to program values, supporting formal verification and proof assistants.[15] Modern developments reflect a shift toward flexibility, with the rise of dynamically typed languages like Python in 1991, created by Guido van Rossum as an interpretable successor to ABC, emphasizing runtime type checking for rapid prototyping in scripting and data tasks without compile-time enforcement.[16] This trend complemented the introduction of gradual typing in TypeScript in 2012 by Microsoft, which adds optional static types to JavaScript via structural typing and type annotations, allowing seamless integration of typed and untyped code to enhance large-scale web development reliability.[17]Basic Type Concepts
In programming languages, a type is fundamentally defined as a set of possible values along with the operations that can be performed on those values. For instance, the integer type encompasses all integer values and permits operations such as addition and subtraction, while the boolean type includes only true and false values and supports logical operations like conjunction and disjunction.[18][19] The assignment of types to values occurs either statically at compile-time, where the type is determined and fixed before execution based on declarations or inferences, or dynamically at runtime, where types are checked and resolved as the program executes. This distinction affects when type-related errors are detected and how values are handled during computation.[20][21] Types propagate through expressions by combining the types of subexpressions according to the rules of the language's operations, ultimately yielding an output type; for example, in a function signature, the types of input parameters determine the expected type of the result through the function's body. This propagation ensures that operations are applied only to compatible values, maintaining consistency in type usage.[22] Type compatibility, which governs whether two types can be used interchangeably, is assessed either nominally—based on explicit name declarations and declared relationships—or structurally, by comparing the internal composition of types regardless of names. Nominal typing requires types to share the same identifier or a declared subtype relation, whereas structural typing allows compatibility if the types possess matching components, such as fields or methods.[23][24]Type Checking Mechanisms
Static Type Checking
Static type checking is a verification process performed by a compiler at compile time to ensure that a program's operations are applied to values of compatible types, thereby preventing type-related errors before execution. The compiler examines the syntactic structure of the source code, assigns types to variables, expressions, and functions—either through explicit declarations or inference—and enforces type rules to detect mismatches, such as attempting to add an integer to a string.[25] If the analysis reveals any ill-typed constructs, the compilation fails, and the programmer receives diagnostic messages pinpointing the issues.[26] One primary advantage of static type checking is the early detection of type errors, which reduces debugging time and avoids runtime failures that could crash the program.[25] Additionally, the type information gathered during this phase enables compiler optimizations, such as dead code elimination, where unused branches or variables identified through type analysis are removed to improve performance.[26] These benefits contribute to more reliable software development, as studies have shown that static typing can reduce the time required to fix errors in programs.[27] In languages like Java, static type checking requires explicit type declarations for variables and enforces compatibility in assignments, method invocations, and inheritance hierarchies during compilation. For instance, attempting to pass a string to a method expecting an integer results in a compile-time error, ensuring type safety from the outset. Haskell exemplifies advanced static type checking through its powerful type inference system, which automatically deduces types without annotations while verifying polymorphic functions and higher-kinded types at compile time.[28] Despite these strengths, static type checking has limitations, as it cannot detect errors dependent on runtime values or behaviors not expressible in the type system.[29] For example, in Java, null dereferences often evade static checks because the type system treats references as potentially null without additional annotations, leading to possible NullPointerExceptions at runtime.[30] Furthermore, conservative type rules may reject valid programs that would execute correctly, requiring workarounds like type casts that weaken guarantees.[29]Dynamic Type Checking
Dynamic type checking is a mechanism in programming languages where type validation and enforcement occur during program execution, rather than at compile time. This process relies on runtime type information (RTTI), which associates type metadata with values or objects to enable on-the-fly checks for type compatibility during operations like assignments, function calls, or method invocations. In practice, RTTI often involves tagging data structures with type descriptors, allowing the runtime environment to inspect and verify types as needed.[31] A common implementation tags every value with its type information at allocation, enabling the interpreter or virtual machine to perform validations dynamically. For instance, in languages like Lisp, tags are used for generic arithmetic and type-safe operations at runtime. This approach contrasts with static checking by deferring type resolution until execution, accommodating scenarios where types depend on runtime conditions, such as user input or dynamic loading.[32] Examples of dynamic type checking include Python'sisinstance() function, which returns True if an object is an instance of a specified class or subclass, allowing runtime verification of type hierarchies.[33] Similarly, JavaScript's typeof operator evaluates the type of a value at runtime, returning a string like "number" or "object" to facilitate conditional logic based on dynamic types.[34] These operators exemplify how dynamic checking integrates into language features for introspection, where programs can query and respond to types during execution.
One key advantage of dynamic type checking is its flexibility in dynamic languages, enabling rapid prototyping and code reuse without rigid type declarations upfront.[32] It supports introspection, allowing programs to examine and manipulate types at runtime, which is essential for reflective features like serialization or debugging tools.[33] Additionally, it facilitates metaprogramming by permitting runtime type modifications and duck typing, where object compatibility is determined by behavior rather than explicit type matching, enhancing expressiveness in scripting and domain-specific languages.[34]
However, dynamic type checking introduces performance overhead due to the repeated runtime inspections, which can significantly increase execution time compared to compile-time alternatives.[35] This overhead arises from tag manipulations and validation logic executed for each relevant operation.[36] Furthermore, errors such as type mismatches are only discovered late, potentially during production runs, leading to harder-to-debug failures rather than early detection.
Hybrid and Optional Type Checking
Hybrid type checking integrates static and dynamic mechanisms within the same language, allowing developers to leverage compile-time verification for most code while permitting runtime flexibility where needed. In C#, thedynamic keyword, introduced in version 4.0, exemplifies this approach by enabling objects to bypass static type checking, treating them as having type object but deferring resolution to runtime.[37] This hybrid model supports interoperability with dynamic languages and COM objects, reducing the need for explicit casts and enhancing adaptability in mixed environments.[37]
Optional typing extends this flexibility by allowing untyped or loosely typed code to coexist in predominantly static systems, facilitating incremental adoption of type annotations. TypeScript's any type serves as a primary mechanism for this, permitting values to escape type checking and interact with existing JavaScript code without errors during compilation.[38] By assigning any, developers can gradually opt into stricter typing, easing migration from untyped JavaScript while maintaining compatibility, though it disables further static analysis on those values.[38]
The evolution of gradual typing builds on these ideas, introducing sound guarantees for safety across typed and untyped boundaries through runtime checks. Originating from foundational work on integrating static and dynamic typing in functional languages, gradual typing allows optional annotations with a dynamic type (often denoted as ?) to control checking levels.[39] In Typed Racket, this manifests as a mature implementation using behavioral contracts at module boundaries to enforce type invariants, ensuring meaningful error messages and preventing uncaught violations.[40] Soundness is achieved via pervasive runtime monitoring, distinguishing it from optional typing by providing formal safety assurances rather than mere convenience.[40]
Recent developments in gradual typing as of 2025 include machine learning techniques to predict the performance impact of type annotations, enabling better optimization of mixed-type codebases.[41] Other advances encompass staged gradual typing calculi for more expressive type systems, robust dynamic embeddings to improve interoperability, and applications such as guard analysis for safe erasure in dynamically typed languages like Elixir.[42][43]
These hybrid and gradual approaches offer significant benefits, particularly in easing code migration and evolution, as untyped components can be added or refactored without rewriting entire systems.[44] For instance, transient semantics in gradual typing enable safe embedding of typed code in untyped contexts, supporting incremental typing without client-side changes.[44] However, challenges arise at type boundaries, where runtime checks introduce performance overheads—up to 168 times slowdown in some Typed Racket benchmarks—and potential errors if invariants are violated, necessitating careful boundary management.[40]
Type System Properties
Strong and Weak Typing
Strong typing refers to a type system in which the language prohibits implicit conversions between incompatible types, ensuring that operations on values respect their declared or inferred types without automatic adjustments that could lead to unintended behavior. This strict enforcement helps prevent type-related errors at compile time or runtime by requiring explicit conversions when mixing types. For instance, in OCaml, a strongly typed language, the expression1 + "a" results in a type error because the integer and string types are incompatible, and no implicit coercion is permitted.
In contrast, weak typing permits automatic type coercions, where the language runtime or compiler implicitly converts operands to compatible types to allow an operation to proceed, often prioritizing usability over strictness. A classic example is JavaScript, where "1" + 1 evaluates to the string "11" because the numeric 1 is coerced to a string for concatenation. This approach simplifies coding for certain scenarios but can introduce subtle bugs if the coercion does not align with programmer intent.[45]
Programming languages exist on a spectrum between strong and weak typing rather than fitting neatly into binary categories. Perl exemplifies weak typing through its extensive use of coercions, such as treating a string like "2abc" as the number 2 in numeric contexts (e.g., 1 + "2abc" yields 3), which enhances flexibility for scripting tasks. Conversely, OCaml represents the strong end, with its type system rejecting any implicit mixing of unrelated types to maintain precision.
The choice between strong and weak typing involves key trade-offs in expressiveness and error-proneness. Strong typing promotes reliability by catching mismatches early, with empirical evidence indicating that strongly typed languages exhibit lower defect densities compared to weakly typed ones in large-scale GitHub analyses. This comes at the cost of reduced flexibility, as developers must handle conversions explicitly, potentially increasing code verbosity. Weak typing, while enabling concise and adaptable code for prototyping or dynamic data handling, heightens the risk of runtime errors from unforeseen coercions, making debugging more challenging.[46]
Type Safety and Security
Type safety refers to the extent to which a programming language prevents type errors, ensuring that well-typed programs do not perform invalid operations on data, such as accessing array elements out of bounds or misinterpreting data representations.[47] In a type-safe language, the type system guarantees that operations respect type invariants, thereby avoiding undefined behavior and enabling reliable program execution without runtime type mismatches.[48] This property is foundational for reasoning about program correctness, as it confines potential errors to compile-time or well-defined runtime checks rather than silent failures.[7] Memory safety constitutes a critical subset of type safety, focusing on protections against memory-related errors like buffer overflows, dangling pointers, and use-after-free conditions that can compromise program integrity.[49] In languages like Rust, memory safety is achieved through an ownership model that enforces unique ownership of heap-allocated data, preventing multiple mutable references and ensuring automatic deallocation upon scope exit without a garbage collector.[50] This system tracks lifetimes and borrowing rules at compile time, eliminating common memory vulnerabilities while maintaining performance comparable to unsafe languages.[51] Despite these safeguards, type safety can be violated in languages permitting unsafe code, leading to exploits such as type confusion attacks where an object is treated as an incompatible type, potentially allowing arbitrary memory reads or writes.[52] For instance, in C++ programs, type confusion arises from polymorphic inheritance misuse, enabling attackers to hijack control flow or corrupt data structures.[53] Such violations underscore the risks in systems with opt-in safety features, where bypassing type checks exposes underlying memory representations to manipulation. Formal verification of type safety often relies on properties like subject reduction in typed lambda calculi, which proves that reduction steps preserve types: if a term is well-typed, its reduct remains well-typed with the same type.[54] This preservation theorem, established in the simply typed lambda calculus, demonstrates that type systems maintain invariants throughout evaluation, providing a theoretical basis for safety guarantees in practical languages.[55] Languages exemplify varying degrees of type safety implementation; Haskell achieves comprehensive safety through its static type system, which rejects ill-typed programs at compile time and enforces purity to prevent side-effect-induced type violations.[56] Its Safe Haskell extension further restricts unsafe operations, ensuring trusted codebases with strict type enforcement.[57] In contrast, C++ provides partial memory safety via smart pointers like std::unique_ptr and std::shared_ptr, which automate resource management and prevent leaks or double-free errors when used correctly, though raw pointers in unsafe code can still introduce vulnerabilities.[58] Strong typing contributes to these protections by minimizing implicit conversions that could lead to safety lapses.[59]Polymorphism in Types
Polymorphism in type systems enables the reuse of code across different types by allowing functions, operators, or classes to operate uniformly on multiple type instances, thereby promoting abstraction and modularity in programming languages.[60] This capability is essential for writing generic algorithms that avoid duplication while maintaining type safety, and it manifests in several forms: ad-hoc, parametric, and subtype polymorphism.[61] Ad-hoc polymorphism allows a single function or operator to behave differently based on the input types, without requiring a common supertype or parametric uniformity. A classic implementation is operator overloading, where the same operator, such as addition (+), is redefined for distinct types like integers and strings; for example, in C++, 1 + 2 performs arithmetic addition, while "hello" + " world" concatenates strings.[61] This form of polymorphism, often achieved through type classes or overloading mechanisms, provides flexibility for domain-specific behaviors but can complicate type checking if not constrained.[62]
Parametric polymorphism supports generic code that operates identically regardless of the concrete types involved, using type parameters to instantiate reusable structures like collections.[60] In languages like Java, this is realized through generics, such as a List<T> class where T is a placeholder for any type, enabling the same list implementation to handle integers or strings without code duplication; the type parameter is substituted at compile time, preserving uniformity.[61] Templates in C++ similarly provide parametric polymorphism, compiling specialized versions for each type instantiation to ensure efficiency.[61]
Subtype polymorphism, also known as inclusion or universal polymorphism, leverages type hierarchies to allow objects of a subtype to be used wherever the supertype is expected, enabling dynamic dispatch.[61] In object-oriented languages, this is commonly implemented via inheritance and interfaces, where virtual methods permit runtime selection of the appropriate implementation; for instance, a Shape supertype with a draw() method can invoke the specific drawing logic of Circle or Rectangle subclasses.[63] While inheritance often facilitates subtyping, the two are distinct: subtyping ensures behavioral compatibility through type compatibility rules, independent of code reuse mechanisms.[63]
Parametric polymorphism can be unbounded, allowing arbitrary types without restrictions, or bounded, imposing constraints to ensure operations are valid on the type parameters.[64] Unbounded generics, as in ML's parametric polymorphism, treat types opaquely without requiring specific capabilities.[61] Bounded variants, such as F-bounded polymorphism in object-oriented settings, restrict type parameters to subtypes of a bound involving the parameter itself, like class Comparable<T extends Comparable<T>> in Java, which ensures mutual comparability while supporting recursive type definitions.[65] These constraints enhance expressiveness in hierarchical types but introduce complexity in type inference and checking.[65]
Advanced Typing Constructs
Type Inference and Declaration
In type systems, explicit type declaration requires programmers to manually specify the types of variables, functions, and other entities in the source code, often through type annotations. This approach ensures that the compiler or interpreter can immediately verify type compatibility during static analysis, promoting early error detection. For instance, in the C programming language, a variable is declared with its type prefixing the identifier, such asint x = 5;, where int explicitly denotes that x holds integer values.[7] Similarly, in Java, explicit declarations are required for method parameters and fields, but since Java 10 (2018), local variables can use type inference with the var keyword, as seen in static int g(boolean x, int y) { return (x ? 1 : y); }, where types like boolean and int are annotated for parameters to define the expected data structures and enable compile-time checks.[7][66] Languages employing explicit declarations, such as C and Java, prioritize this mechanism to enforce type safety in statically typed environments, reducing runtime surprises at the cost of added verbosity in code.[7]
In contrast, type inference automates the deduction of types from contextual usage, allowing programmers to omit explicit annotations while the compiler infers them algorithmically. The seminal Hindley-Milner algorithm, introduced by J. Roger Hindley in 1969 and extended by Robin Milner in 1978, enables this by assigning principal type schemes to expressions in the polymorphic lambda calculus through unification of type variables.[67][68] For example, in Haskell, which implements Hindley-Milner via the Damas-Milner variant, the function map can be inferred as having the polymorphic type ((α → β) × α list) → β list without annotations, where α and β are type variables unified based on application contexts.[68] This inference supports parametric polymorphism, ensuring that well-typed programs avoid type violations, and was originally developed for the ML metalanguage in the LCF theorem prover to trap errors in structure-processing code.[68]
Type inference, however, has inherent limits depending on its scope and the complexity of the type system. Local inference operates within the immediate syntactic context, such as inferring types for adjacent abstract syntax tree nodes, which simplifies implementation but requires more annotations for distant dependencies; for example, bidirectional checking in languages like Scala infers parameter types for anonymous functions locally but may fail for interdependent polymorphic bounds.[69][70] Whole-program inference, as in Haskell's Hindley-Milner implementation, generates and solves constraints across the entire codebase for complete deduction, minimizing annotations but complicating error localization since issues may propagate from remote sources.[69] Challenges arise particularly with polymorphism: standard Hindley-Milner is restricted to rank-1 polymorphism, where polymorphic types are second-class and cannot be nested deeply without annotations, and extending to polymorphic recursion renders inference undecidable due to the need for semi-unification instead of simple unification.[70][71]
The choice between explicit declarations and type inference involves key trade-offs in developer productivity and system design. Inference reduces boilerplate code—studies of ML programs show it eliminates 13–39 annotations per 100 lines for polymorphic instantiations—enhancing readability and easing maintenance, especially for novices learning functional paradigms.[70][72] However, it increases compiler complexity, as non-local inference demands tracing type dependencies across modules, potentially obscuring errors and slowing compilation compared to the straightforward verification of explicit types.[72] Explicit declarations, while verbose, serve as self-documenting aids that accelerate debugging and API comprehension, particularly in large-scale object-oriented codebases where inference might hide subtle mismatches.[72] Overall, local inference balances these by minimizing overhead while preserving explicitness for critical interfaces, though whole-program approaches excel in purely functional settings at the expense of inference tractability.[69][72]
Subtyping and Equivalence
In type systems, equivalence between types determines when two types can be considered identical for purposes such as assignment or function application. Nominal type equivalence relies on explicit declarations or names to identify types as equal; for instance, in C, two struct types with identical field layouts but different tags are not equivalent unless explicitly unified via typedef.[73] In contrast, structural type equivalence assesses compatibility based on the shape or components of the types, disregarding names; Go's interfaces exemplify this, where a type satisfies an interface if it implements the required methods, regardless of explicit declaration.[74] Subtyping extends equivalence by defining a partial order where a subtype can safely substitute for its supertype in any context, preserving program behavior. The Liskov substitution principle formalizes this by requiring that if S is a subtype of T, then objects of type S must be substitutable for objects of type T without altering the program's observable properties, including preconditions, postconditions, and invariants.[75] For record types, subtyping often incorporates width and depth rules: width subtyping permits a subtype to include additional fields beyond those of the supertype (as the extra fields can be ignored), while depth subtyping allows fields in the subtype to have types that are themselves subtypes of the corresponding supertype fields, enabling recursive compatibility.[76] Function types exhibit more nuanced subtyping rules due to their input and output roles. Inputs are contravariant— if S is a subtype of T, then the function type T → U is a subtype of S → U, as a function expecting a supertype can accept a subtype argument safely. Outputs are covariant— if V is a subtype of W, then S → V is a subtype of S → W, since a function producing a subtype can fulfill expectations for the supertype. These variance rules, rooted in early type theory discussions, ensure type safety in higher-order contexts.[77] Subtyping finds key applications in object-oriented programming, particularly in inheritance hierarchies where derived classes act as subtypes of base classes, allowing polymorphic use of objects. Function overriding leverages these rules by permitting subtype methods to accept supertype arguments (contravariant parameters) and return subtype results (covariant returns), maintaining compatibility with the overridden signature.[75] This relational framework underpins polymorphism by enabling subtype objects to fulfill supertype roles seamlessly.[78]Unified Type Systems
A unified type system organizes all data types within a programming language or runtime environment into a single, cohesive hierarchy, ensuring that every value can be treated uniformly under a common root type. This approach contrasts with fragmented type systems by providing a consistent framework for type declarations, operations, and interactions, often rooting all types in a universal superclass or representation.[79][80] In languages like Java, this unification manifests through thejava.lang.Object class, which serves as the root of the class hierarchy; every class, including primitives when boxed, descends from Object, enabling polymorphic treatment of all objects. Similarly, the .NET Common Type System (CTS) enforces a unified model where all types—value types like integers and reference types like classes—ultimately derive from System.Object, facilitating seamless integration across languages in the .NET ecosystem. In Lisp dialects such as Common Lisp, unification arises from a homogeneous representation where all data and code are expressed as symbolic expressions (S-expressions), primarily lists, allowing uniform manipulation via list-processing primitives.[81][79][82]
The primary benefits of unified type systems include simplified uniform operations, such as applying common methods like equality checks or serialization to any value, and enhanced interoperability, particularly in multi-language environments where components must share data without type mismatches. For instance, the CTS in .NET allows code written in C# to invoke assemblies in Visual Basic or F# without explicit type conversions, promoting modular development. However, these systems incur drawbacks, notably performance overhead from boxing primitive types into objects; in Java and .NET, operations on unboxed primitives like int are efficient, but unification requires wrapping them as Integer or Int32 objects for hierarchy access, leading to memory allocation and garbage collection costs.[79][83]
Within unified hierarchies, variants often incorporate a top type, which acts as the universal supertype encompassing all possible values (e.g., Java's Object or TypeScript's any for generics), and a bottom type, which serves as the universal subtype with no inhabitable values (e.g., Never for non-terminating computations or error signaling), providing bounds for type lattices and aiding in error handling or exhaustive pattern matching.[84]
Specialized Type Systems
Dependent and Linear Types
Dependent types allow types to be parameterized by values rather than solely by other types, enabling the expression of properties that depend on program data. This construct treats types as propositions in the sense of the Curry-Howard isomorphism, where a value of a dependent type serves as a proof of that proposition. For instance, in languages like Agda and Idris, one can define a typeVec A n representing vectors of length n (where n is a natural number value), ensuring that operations on such vectors respect the specified length at compile time.[85][86]
A classic example is the append function for vectors: given xs : Vec A m and ys : Vec A n, the result has type Vec A (m + n), where the addition occurs at the type level to guarantee the output length. This approach enriches type systems, such as in Xi and Pfenning's DML, by integrating restricted dependent types over constraint domains (e.g., linear inequalities on naturals) to support practical programming features like polymorphism and higher-order functions while maintaining decidable type checking through constraint solving.[87] In Agda, the type Vec is defined inductively, with constructors like [] : Vec A zero and (_∷_) : {n : Nat} → A → Vec A n → Vec A (suc n), allowing the type checker to verify length-preserving operations such as concatenation or mapping.[85]
Linear types, originating from Jean-Yves Girard's linear logic, assign types to values that must be consumed exactly once—neither duplicated nor discarded—modeling resources with strict usage discipline. In linear logic, this is captured by multiplicatives like tensor (⊗) for parallel resource use and linear implication (⊸) for consumption to produce a new resource, contrasting with classical logic's reusable assumptions. Philip Wadler's work demonstrates how linear types extend functional languages, enabling destructive updates (e.g., array modifications) without garbage collection by tracking single-use references, as in an update function that consumes an array and produces a modified one.[88][89]
Rust's ownership model approximates linear types through affine usage (allowing discard but not duplication), where each value has a unique owner, and borrowing enforces temporary access without aliasing to prevent data races. For example, transferring ownership of a value x : T moves it to a new scope, consuming the original binding, which aligns with linear logic's resource linearity to ensure memory safety at compile time.[90]
Applications of dependent types include proof-carrying code, where programs embed proofs of safety properties (e.g., array bounds or invariants) directly in types, verifiable via type checking without runtime overhead. In systems like Epigram, dependent types certify partial correctness, such as sorted lists, by requiring proofs as values that can be erased post-verification. Linear types find use in session types for protocols, where linear logic propositions encode communication sequences; for instance, a session type !A ⊸ B describes a server offering repeated sessions of type A before proceeding to B, ensuring deadlock-free concurrency in π-calculus processes.[91][92]
These systems introduce challenges, particularly in complexity and decidability. Full dependent types often lead to undecidable type checking when combined with recursion, as it reduces to solving arbitrary constraints in expressive domains, necessitating interactive theorem proving or restricted forms for practicality. Linear types, while aiding resource management, can complicate programming by "infecting" data structures (e.g., any container of linear values becomes linear), increasing annotation burden and limiting composability in large codebases.[93]
Union, Intersection, and Existential Types
Union types, also known as sum types or variant types, allow a value to belong to one of several alternative types, denoted as or . This construct enables the representation of heterogeneous data structures where the exact type is determined at runtime or through explicit tagging, facilitating safe handling of multiple possibilities without resorting to dynamic typing. In type theory, union types form the dual of product types and support operations like type-case expressions for exhaustive pattern matching. For instance, in languages like Rust, enums serve as tagged unions, where each variant carries associated data, ensuring compile-time checks for all cases in match expressions. Seminal work on union types in programming languages emphasizes their role in typing branching constructs and pattern matching, as seen in the CDuce calculus where unions enable precise typing of functions like flatten over trees of arbitrary element types: .[94] Union types are particularly useful for modeling semistructured data, such as XML processing, where values may conform to varying schemas, allowing operations to construct and deconstruct untagged unions while maintaining type safety.[95] Intersection types, denoted as or , represent values that satisfy multiple type constraints simultaneously, effectively combining the behaviors of several types into one. Originating from extensions to the lambda calculus like , intersections provide a greatest lower bound in the type lattice, enabling coherent overloading where a single function can be used polymorphically across different input domains. In Benjamin Pierce's foundational thesis, intersections are integrated with bounded polymorphism in systems like Fω, allowing expressions like a doubling function typed as , which supports flexible reuse without explicit type annotations.[96] In modern languages such as TypeScript, intersection types combine interfaces or object types, as in , permitting objects to fulfill multiple roles like both identifiable and ageable entities. This construct aids in capability systems by ensuring values possess all required traits for a context, such as a function that acts as both a mapper over integers and a filter over booleans. Theoretical properties include decidable subtyping for intersections with bounded variables, enhancing static analysis for program equivalence and strictness.[96] Existential types, written as , encapsulate an unknown type within a bound, hiding the concrete instantiation while exposing a common interface. Introduced in type theory to model abstract data types, they allow packaging of implementations where the hidden type is existentially quantified, ensuring clients interact only through the provided operations without leaking details. The seminal paper by Mitchell and Plotkin establishes that abstract type declarations in languages like CLU and ML correspond to existential types, providing a logical foundation via the Curry-Howard isomorphism where existentials align with disjunctive proofs.[97] In object-oriented settings, Java's wildcard types, such as , implement bounded existentials to handle variance in generics, enabling covariant subtyping for producers (e.g., reading from a list of fruits) while preserving safety. Formal models confirm that these wildcards are sound extensions of existential types, with pack and unpack operations mirroring the introduction and elimination rules. This supports use cases like generic algorithms over bounded collections without requiring full universal quantification.[98] Collectively, union, intersection, and existential types address heterogeneous data and capability needs in type systems. Unions handle alternatives in data structures like error results (success or failure), intersections enable multifaceted functions in overloaded APIs, and existentials support modular abstraction in libraries. These constructs extend polymorphic systems by allowing precise static guarantees for dynamic-like behaviors, as in session types where intersections and unions model branching protocols. In capability systems, they enforce least privilege by intersecting required permissions or existentially hiding sensitive implementations, improving security without runtime overhead.[94][99]Gradual and Refined Typing
Gradual typing integrates static and dynamic typing within a single language, enabling programmers to gradually introduce type annotations to existing dynamically typed codebases without requiring full rewrites. This approach uses a special unknown type, often denoted as ?T or simply any, to represent values whose types are not yet specified, allowing seamless interoperability between typed and untyped components. The foundational theory, introduced by Siek and Taha, ensures type safety through runtime checks that monitor interactions and assign "blame" to the source of type errors, typically the untyped code, preventing unsafe operations from propagating unchecked.[39] In practice, gradual typing has been implemented in languages like TypeScript and Flow, where the any type in TypeScript permits flexible code evolution by opting into static checks incrementally, while Flow emphasizes sound checking with blame assignment for large-scale JavaScript adoption at organizations like Facebook. Soundness is maintained via contract-based enforcement, where proxies wrap untyped values to validate operations at boundaries, a mechanism refined in post-2010 developments to handle performance overhead through optimized monitoring. These systems build briefly on earlier optional typing foundations by adding dynamic guarantees for mixed-code execution.[100][101][102] Refined typing extends base types with logical predicates to express precise properties, such as non-nullability or bounds, enabling verification of program behaviors beyond simple structural compatibility. In Liquid Haskell, refinements like{x: Int | x > 0} denote positive integers, checked automatically using SMT solvers integrated into the type system for decidable verification of functional properties in Haskell code. This approach achieves soundness by refining contracts at type boundaries, ensuring predicates hold during execution while supporting modular reasoning about refined interfaces.
Applications of gradual and refined typing include migrating legacy dynamically typed code to safer variants, as seen in TypeScript's adoption for JavaScript ecosystems, and verifying advanced properties like absence of runtime errors or resource bounds in refined systems like Liquid Haskell, which has proven effective for real-world Haskell libraries by catching issues early without exhaustive proofs.[103][104]
Theoretical Foundations
Decision Problems in Typing
In type systems, decision problems concern the computability of key analyses such as type inference and subtyping, which determine whether a program can be assigned a valid type and whether types are compatible under a given relation. These problems are foundational to ensuring the reliability of type checkers in programming languages. For instance, in the simply-typed lambda calculus, type inference is decidable and can be performed in linear time using first-order unification algorithms. However, extending the system with features like polymorphism or recursion often renders inference undecidable, highlighting the trade-offs between expressiveness and algorithmic tractability. Type inference becomes undecidable in the presence of unrestricted recursion combined with higher-order features, such as in the second-order polymorphic lambda calculus (System F), where typability and type checking are proven undecidable via reductions to higher-order unification problems.[105] In contrast, the Hindley-Milner type system, which supports first-order polymorphism through let-bound generalizations, admits decidable type inference via a complete and efficient algorithm that relies on unification without higher-order variables. Full second-order polymorphism, however, requires approximations or restrictions in practical implementations, such as those in languages like OCaml, where recursive modules or impredicative instantiations are limited to preserve decidability.[106] Subtyping relations, which enable type compatibility for operations like function application or inheritance, exhibit varying decidability based on the type system's structure. In structural subtyping systems without recursive types, the subtyping problem is decidable, as demonstrated by automata-theoretic reductions that solve the first-order theory of subtyping constraints in exponential time.[107] Structural systems, common in languages like OCaml for records, benefit from this efficiency, allowing straightforward algorithmic checks via recursive descent on type structures. However, in dependent type systems, where types can depend on values, subtyping is generally more complex and often undecidable without restrictions, due to the interplay between propositional equalities and type dependencies that can encode undecidable problems like higher-order matching.[108] Decidable variants exist by imposing syntactic bounds, such as in path-dependent types, where subtyping is resolved through normalized representations that avoid infinite descent.[109] These decidability results have significant implications for compiler design and language implementation. Undecidability in advanced systems necessitates approximation techniques, such as partial inference or user annotations, to make type checking feasible in practice; for example, ML compilers approximate higher-rank polymorphism to avoid exponential blowup or non-termination.[71] Such limitations underscore the need for careful system design, balancing theoretical completeness with practical performance, and motivate ongoing research into decidable extensions that capture more expressive typing behaviors without sacrificing automation.Type Errors and Debugging
Type errors occur when a program's code violates the rules of its type system, typically detected during static analysis in statically typed languages, preventing many runtime failures before execution. These errors arise from mismatches between expected and actual types in expressions, function applications, or variable usages, and are a common challenge for developers, particularly in languages with complex type inference like Haskell or Rust.[110] Common type errors include mismatched arguments, where a function receives an input of an incompatible type, such as passing a string to a numeric operation expecting an integer. For instance, in Python with static type checking, this manifests as an "incompatible parameter type" error, often resolved by swapping arguments or casting types. Uninitialized variables represent another frequent issue, occurring when code attempts to use a variable before assignment, leading to undefined or default type assumptions that fail checks; in Rust, the compiler enforces initialization, flagging such uses as errors to ensure memory safety. Infinite type recursions, prevalent in higher-order functional languages like Haskell, happen when a type variable recursively expands without termination, such as in a self-applied function likelet f x = x x, resulting in an "occurs check" failure where the type a ~ a -> b cannot be constructed. These categories, including incompatible return types and unsupported operands, account for the majority of static type issues in real-world codebases.[111]
Diagnosis of type errors relies on compiler-generated messages that pinpoint the mismatch, such as TypeScript's detailed output stating "Expected number, but got string" with sub-explanations of incompatibility. Integrated development environment (IDE) tools enhance this by providing real-time hints, like auto-suggestions in rust-analyzer for unresolved types or Visual Studio Code extensions for Haskell that highlight inference ambiguities. These diagnostics help isolate the erroneous expression, often including context like expected versus actual types to guide corrections.[112]
Debugging techniques emphasize incremental refinement, starting with explicit type annotations to clarify ambiguous inferences and narrow the error scope; in Haskell, adding a signature like f :: a -> a to a function can resolve monomorphism restrictions or reveal downstream mismatches. Gradual typing systems, which blend static and dynamic checking, aid isolation by allowing partial annotations where untyped regions propagate dynamically until a contract violation, using blame assignment to trace errors to specific code parts without halting the entire program.[113][114]
Best practices for managing type errors incorporate unit testing that leverages the type system to verify boundaries, such as defining test cases with precise input types to catch mismatches early, ensuring tests remain isolated and deterministic. Refinement types further enhance prevention by attaching logical predicates to base types, like {x: Int | x > 0} for positive integers, which statically rule out invalid usages such as division by zero before compilation. These approaches, combined with static checking, minimize error occurrence and improve code reliability.[115][116]