Recent from talks
Nothing was collected or created yet.
Symbol (programming)
View on WikipediaThis article focuses only on one specialised aspect of its subject. (October 2023) |
A symbol in computer programming is a primitive data type whose instances have a human-readable form. Symbols can be used as identifiers. In some programming languages, they are called atoms.[1] Uniqueness is enforced by holding them in a symbol table. The most common use of symbols by programmers is to perform language reflection (particularly for callbacks), and the most common indirectly is their use to create object linkages.
In the most trivial implementation, they are essentially named integers; e.g., the enumerated type in C language.
Support
[edit]The following programming languages provide runtime support for symbols:
| language | type name(s) | example literal(s) |
|---|---|---|
| ANSI Common Lisp | symbol, keyword | symbol, :keyword
|
| Clojure | symbol,[2] keyword[3] | symbol, :keyword
|
| Dart | Symbol[4] | #sym
|
| Elixir | atom, symbol | :sym
|
| Erlang | atom | sym or 'sym'
|
| JavaScript (ES6 and later) | Symbol | Symbol("sym");
|
| Julia | Symbol | :sym
|
| K | symbol | `sym |
| Objective-C | SEL | @selector(sym)
|
| PICAXE BASIC | symbol | symbol let name = variable
|
| PostScript | name | /sym or sym
|
| Prolog | atom, symbol | sym or 'sym'
|
| Ruby | Symbol | :sym or :'sym'
|
| Scala | scala.Symbol | 'symbol
|
| Scheme | symbol | symbol
|
| Smalltalk | Symbol | #sym or #'sym'
|
| SML/NJ | Atom.atom | |
| Wolfram Language | Symbol | Symbol["sym"] or sym
|
Julia
[edit]Symbols in Julia are interned strings used to represent identifiers in parsed Julia code(ASTs) and as names or labels to identify entities (for example as keys in a dictionary).[5]
Lisp
[edit]A symbol in Lisp is unique in a namespace (or package in Common Lisp). Symbols can be tested for equality with the function EQ. Lisp programs can generate new symbols at runtime. When Lisp reads data that contains textual represented symbols, existing symbols are referenced. If a symbol is unknown, the Lisp reader creates a new symbol.
In Common Lisp, symbols have the following attributes: a name, a value, a function, a list of properties and a package.[6]
In Common Lisp it is also possible that a symbol is not interned in a package. Such symbols can be printed, but when read back, a new symbol needs to be created. Since it is not interned, the original symbol can not be retrieved from a package.
In Common Lisp symbols may use any characters, including whitespace, such as spaces and newlines. If a symbol contains a whitespace character, it needs to be written as |this is a symbol|. Symbols can be used as identifiers for any kind of named programming constructs: variables, functions, macros, classes, types, goto tags and more.
Symbols can be interned in a package.[7] Keyword symbols are self-evaluating,[8] and interned in the package named KEYWORD.
Examples
[edit]The following is a simple external representation of a Common Lisp symbol:
this-is-a-symbol
Symbols can contain whitespace (and all other characters):
|This is a symbol with whitespace|
In Common Lisp symbols with a leading colon in their printed representations are keyword symbols. These are interned in the keyword package.
:keyword-symbol
A printed representation of a symbol may include a package name. Two colons are written between the name of the package and the name of the symbol.
package-name::symbol-name
Packages can export symbols. Then only one colon is written between the name of the package and the name of the symbol.
package:exported-symbol
Symbols, which are not interned in a package, can also be created and have a notation:
#:uninterned-symbol
PostScript
[edit]In PostScript, references to name objects can be either literal or executable, influencing the behaviour of the interpreter when encountering them. The cvx and cvl operators can be used to convert between the two forms. When names are constructed from strings by means of the cvn operator, the set of allowed characters is unrestricted.
Prolog
[edit]In Prolog, symbols (or atoms) are the main primitive data types, similar to numbers.[9] The exact notation may differ in different Prolog dialects. However, it is always quite simple (no quotations or special beginning characters are necessary).
Contrary to many other languages, it is possible to give symbols a meaning by creating some Prolog facts and/or rules.
Examples
[edit]The following example demonstrates two facts (describing what father is) and one rule (describing the meaning of sibling). These three sentences use symbols (father, zeus, hermes, perseus and sibling) and some abstract variables (X, Y and Z). The mother relationship is omitted for clarity.
father(zeus, hermes).
father(zeus, perseus).
sibling(X, Y) :- father(Z, X), father(Z, Y).
Ruby
[edit]In Ruby, symbols can be created with a literal form, or by converting a string.[1] They can be used as an identifier or an interned string.[10] Two symbols with the same contents will always refer to the same object.[11] It is considered a best practice to use symbols as keys to an associative array in Ruby.[10][12]
Examples
[edit]The following is a simple example of a symbol literal in Ruby:[1]
my_symbol = :a
my_symbol = :"an identifier"
Strings can be coerced into symbols, vice versa:
irb(main):001:0> my_symbol = "Hello, world!".intern
=> :"Hello, world!"
irb(main):002:0> my_symbol = "Hello, world!".to_sym
=> :"Hello, world!"
irb(main):003:0> my_string = :hello.to_s
=> "hello"
Symbols are objects of the Symbol class in Ruby:[13]
irb(main):004:0> my_symbol = :hello_world
=> :hello_world
irb(main):005:0> my_symbol.length
=> 11
irb(main):006:0> my_symbol.class
=> Symbol
Symbols are commonly used to dynamically send messages to (call methods on) objects:
irb(main):007:0> "aoboc".split("o")
=> ["a", "b", "c"]
irb(main):008:0> "aoboc".send(:split, "o") # same result
=> ["a", "b", "c"]
Symbols as keys of an associative array:
irb(main):009:0> my_hash = { a: "apple", b: "banana" }
=> {:a=>"apple", :b=>"banana"}
irb(main):010:0> my_hash[:a]
=> "apple"
irb(main):011:0> my_hash[:b]
=> "banana"
Smalltalk
[edit]In Smalltalk, symbols can be created with a literal form, or by converting a string. They can be used as an identifier or an interned string. Two symbols with the same contents will always refer to the same object.[14] In most Smalltalk implementations, selectors (method names) are implemented as symbols.
Examples
[edit]The following is a simple example of a symbol literal in Smalltalk:
my_symbol := #'an identifier' " Symbol literal "
my_symbol := #a " Technically, this is a selector literal. In most implementations, "
" selectors are symbols, so this is also a symbol literal "
Strings can be coerced into symbols, vice versa:
my_symbol := 'Hello, world!' asSymbol " => #'Hello, world!' "
my_string := #hello: asString " => 'hello:' "
Symbols conform to the symbol protocol, and their class is called Symbol in most implementations:
my_symbol := #hello_world
my_symbol class " => Symbol "
Symbols are commonly used to dynamically send messages to (call methods on) objects:
" same as 'foo' at: 2 "
'foo' perform: #at: with: 2 " => $o "
References
[edit]- ^ a b c Thomas, Dave; Fowler, Chad; Hunt, Andy (2001). Programming Ruby the pragmatic programmers' guide; [includes Ruby 1.8] (2nd, 10 print. ed.). Raleigh, North Carolina: The Pragmatic Bookshelf. ISBN 978-0-9745140-5-5.
- ^ Symbols on the page on Data Structures
- ^ Keywords on the page on Data Structures
- ^ "A tour of the Dart language | Symbols". Dart programming language. Retrieved 17 January 2021.
- ^ "Julia Core.Symbol". Julia Documentation. Retrieved 31 May 2022.
- ^ "CLHS: System Class SYMBOL". www.lispworks.com.
- ^ "CLHS: System Class PACKAGE". www.lispworks.com.
- ^ Peter Norvig: Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp, Morgan Kaufmann, 1991, ISBN 1-55860-191-0, Web
- ^ Bratko, Ivan (2001). Prolog programming for artificial intelligence. Harlow, England; New York: Addison Wesley. ISBN 978-0-201-40375-6.
- ^ a b Kidd, Eric (20 January 2007). "13 Ways of Looking at a Ruby Symbol". Random Hacks. Retrieved 10 July 2011.
- ^ "Programming Ruby: The Pragmatic Programmer's Guide".
- ^ "Using Symbols for the Wrong Reason". Gnomic Notes.
- ^ "Symbol". Ruby Documentation. Retrieved 10 July 2011.
- ^ http://wiki.squeak.org/squeak/uploads/172/standard_v1_9-indexed.pdf ANSI Smalltalk standard.
Symbol (programming)
View on GrokipediaSymbol is a built-in primitive type that generates unique, immutable values, primarily used as non-enumerable keys for object properties to enable private-like attributes without external interference. Symbols in JavaScript can be created via the Symbol() constructor for uniqueness or Symbol.for() for a global registry of shared instances, and they include well-known symbols (e.g., Symbol.iterator) that customize built-in behaviors. Ruby also features symbols as lightweight, immutable identifiers created with the : prefix (e.g., :name), optimized for hash keys and method invocations due to their interned nature, distinguishing them from mutable strings intended for textual data. These implementations highlight symbols' role in enhancing code safety, performance, and expressiveness across diverse programming paradigms.[3][4]
Introduction
Definition
In programming, a symbol is a primitive data type that serves as a unique, immutable identifier, typically implemented as an atomic value or interned string to represent names or concepts without alteration.[5] Unlike mutable data structures, symbols maintain their integrity throughout a program's execution, functioning as lightweight objects that can be hashed and used in collections for quick lookups.[6] Symbols facilitate efficient equality comparisons by leveraging reference equality—identical symbols point to the same memory location—avoiding the computational overhead of string matching or content-based checks.[6] This makes them ideal for tasks requiring frequent identity verification, such as tagging data or serving as keys in associative structures, where performance gains are significant over equivalent string operations.[7] Historically, symbols emerged to enable symbolic computation, allowing names to directly represent abstract concepts or entities that could be manipulated as data, as pioneered in early list-processing systems.[5] In pseudocode, symbols are often created using notation like:keyword for predefined identifiers or symbol("name") for dynamic generation from a string description.[8] This contrasts with strings, which are mutable sequences primarily for textual data rather than unique identification.[6]
Distinction from Strings and Identifiers
In programming languages that support symbols as a primitive data type, such as those influenced by Lisp, symbols differ fundamentally from strings in their structure and behavior. A symbol is an atomic object uniquely identified by its print name, which is a string, but the symbol itself is a single, interned entity where all instances with the same name refer to the identical object in memory.[1] In contrast, strings are mutable or immutable sequences of characters that can be duplicated independently, with each creation resulting in a separate object even if their contents match exactly.[9] This interning mechanism for symbols ensures that equality comparisons (such as pointer equality) are efficient, avoiding the need for content-based comparisons required for strings, which can involve scanning each character.[10] Symbols also stand apart from identifiers, which are syntactic constructs in source code used to name variables, functions, or other entities during parsing and compilation. Identifiers are resolved into symbols by the language's reader or evaluator, transforming compile-time names into runtime values that can be manipulated as data.[11] For instance, an unquoted identifier likefoo in code is treated as a reference to a bound value, whereas the quoted form 'foo denotes the symbol itself as a literal data object.[12] This distinction allows symbols to serve dual roles—as identifiers for naming and as self-evaluating atoms in data structures—while identifiers remain tied to the language's lexical rules without independent runtime existence.[10]
The primary motivations for distinguishing symbols from strings and identifiers stem from efficiency and reliability in symbolic computation. By treating symbols as unique by value, languages avoid the runtime overhead of hashing or comparing variable-length strings, enabling faster lookups in environments, hash tables, or property lists where symbols act as keys.[9] This design also mitigates errors from typographical mismatches, as identical symbol names always resolve to the same object, unlike strings that might differ subtly in casing or whitespace. For example, symbols can function as enum-like constants (e.g., 'RED and another 'RED are the same entity), ideal for internal representations in metaprogramming, whereas dynamically parsed strings suit variable text input from users.[1]
However, these distinctions introduce trade-offs: symbols prioritize computational efficiency over flexibility, lacking the decomposability of strings for text manipulation and the contextual binding of identifiers for variable scoping.[11] While conversions between symbols and strings are possible (e.g., via functions like symbol->string), symbols are less suitable for human-readable output or dynamic content generation, reinforcing their role as optimized, opaque tokens rather than general-purpose text.[9]
Historical Development
Origins in Lisp
The concept of symbols in programming originated with John McCarthy's design of Lisp in 1958, where they served as fundamental atomic elements for list processing and the manipulation of symbolic expressions, known as S-expressions.[13] McCarthy introduced symbols as part of Lisp's core syntax to handle structured data in a way that facilitated recursive computations on lists, distinguishing them from numerical or literal values by their role as named entities without internal structure.[5] This design was motivated by the need for a language capable of processing symbolic information efficiently, drawing from earlier ideas in algebraic list processing languages like IPL.[13] In the context of early artificial intelligence research, symbols played a pivotal role as atoms within S-expressions, enabling the representation of knowledge and logical structures essential for AI tasks such as theorem proving and pattern matching.[13] By treating symbols as building blocks of expressions like(PLUS X Y), Lisp achieved homoiconicity, where source code could be directly represented and manipulated as data, blurring the lines between programs and the information they process.[5] This feature was crucial for McCarthy's vision of AI systems that could reason symbolically, as seen in his early work on the Advice Taker program.[13]
Symbols in Lisp were implemented as first-class objects, possessing key properties that supported dynamic evaluation and extensibility. Each symbol maintained a print name for input/output representation, a value cell to store its associated value (such as a variable binding), and a function cell to hold a function definition, allowing symbols to serve multifaceted roles in computation.[13] These properties were integral to Lisp's interpreter, enabling operations like substitution and evaluation on symbolic expressions.[5]
The foundational ideas were formally articulated in McCarthy's seminal 1960 paper, "Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I," which defined symbols as atomic S-expressions and outlined their use in recursive functions and predicates like atom and eq.[5] This publication established Lisp's theoretical basis, influencing subsequent developments in symbolic computation.[13]
Spread to Other Languages
The concept of symbols, originating in Lisp, began influencing other languages in the 1970s as dynamic and object-oriented paradigms gained traction. Smalltalk, developed in 1972 at Xerox PARC, adopted symbols as unique, immutable identifiers for message selectors in its object messaging system, drawing inspiration from Lisp's symbolic computation and self-describing interpreter design.[14] Similarly, Prolog, introduced in 1972, incorporated atoms—functionally equivalent to Lisp symbols—as constant terms for pattern matching and logic representation, enabling efficient symbolic processing in logic programming.[15] By the 1990s and 2000s, the idea saw a revival in languages emphasizing expressiveness and efficiency. Ruby, created in 1995 by Yukihiro Matsumoto, integrated symbols as lightweight, interned constants for use as hash keys and method names, inspired by Lisp's and Smalltalk's handling of identifiers to optimize performance in dynamic environments. In parallel, PostScript, developed by Adobe in the early 1980s, featured name objects that functioned analogously to symbols, serving as atomic identifiers for graphics operations and dictionary keys in its stack-based, interpreted model for document rendering.[16] In the 2010s, symbols reemerged in languages tailored for specialized domains. Julia, launched in 2012, employs symbols primarily for metaprogramming in scientific computing, allowing efficient representation of variable names and expressions in high-performance numerical code. Elixir, released in 2011, inherited atoms directly from Erlang—its underlying virtual machine—using them as compact constants for process messaging and state representation to support functional programming and high-concurrency applications like telecommunications systems. This spread was propelled by the rising popularity of dynamic languages, where symbols facilitated metaprogramming through programmatic code manipulation and delivered performance gains via interning in interpreted settings. However, mainstream languages like Python and Java largely eschewed dedicated symbols, relying instead on strings for similar roles due to their sufficient flexibility in general-purpose contexts.Core Properties
Interning Mechanism
The interning mechanism for symbols in programming languages, particularly those influenced by Lisp, involves maintaining registries—such as obarrays in Emacs Lisp or packages in Common Lisp, each often implemented as a hash table or symbol table—to enforce uniqueness by name within their respective scopes. Upon the first encounter of a symbol name, the system allocates a single immutable object representing that symbol and stores it in the registry; all subsequent references to the same name retrieve this existing object rather than creating a duplicate. This process ensures that symbols with identical names are guaranteed to be the same object across the program's execution, preventing redundancy and enabling efficient lookups.[17][18] The interning process typically occurs automatically during symbol creation, such as when reading source code or explicitly via functions likeintern in Lisp dialects. To create or retrieve a symbol, the runtime checks the registry for the given name (case-sensitively or normalized as per the language's conventions). If the name is absent, a new symbol object is instantiated with the name as its print representation, added to the registry, and returned as a reference. If present, the stored reference is returned immediately. This algorithmic approach, often using hash-based storage for O(1) average-case lookup, is exemplified in the following pseudocode:
function intern(name):
if registry.contains(name):
return registry.get(name)
else:
sym = create_symbol(name) // Allocates a new immutable Symbol object
registry.put(name, sym)
return sym
function intern(name):
if registry.contains(name):
return registry.get(name)
else:
sym = create_symbol(name) // Allocates a new immutable Symbol object
registry.put(name, sym)
return sym
eq predicate—rather than expensive content or string comparisons, achieving constant O(1) time complexity even for long names. This efficiency stems from the shared object identity enforced by the registry, making symbols particularly suitable for frequent comparisons in interpreters and compilers. The mechanism presupposes symbol immutability, as alterations to a shared symbol would corrupt all references to it.[20][18]
Immutability and Performance Benefits
One key property of symbols is their immutability: once created, a symbol's name and associated package cannot be modified, which promotes referential transparency by ensuring that references to the same symbol always denote the identical object without risk of side effects from unintended changes. This design prevents errors arising from mutable identifiers, as the core identity of the symbol remains fixed throughout its lifetime, facilitating reliable symbolic manipulation in programs. The immutability of symbols, combined with their interning mechanism, yields significant performance benefits. By sharing a single instance across all references to the same name, memory usage is substantially reduced compared to duplicating string-like representations, avoiding the overhead of multiple allocations for equivalent identifiers.[21] Equality checks become efficient, often reducible to simple object identity comparisons (such aseq in Lisp dialects), which are faster than content-based string comparisons, and hashing operations can leverage precomputed or pointer-based values for quick lookups.[22] Additionally, this sharing minimizes garbage collection pressure, as fewer transient objects are created from repeated symbol usage, leading to improved runtime efficiency in symbol-intensive computations.
In multi-threaded environments, the immutability of symbols enhances thread-safety, allowing concurrent access to shared symbol instances without requiring synchronization locks, since no modifications can introduce race conditions or data corruption.[23] However, this design carries potential drawbacks: the central registry maintaining interned symbols can expand indefinitely with unique names, risking memory leaks if unused symbols persist. Some implementations address this through weak references in the symbol table, enabling garbage collection of symbols lacking strong external references while preserving equality semantics.[24]
Use Cases
In Metaprogramming and Reflection
In metaprogramming, symbols function as unique, interned identifiers that represent program elements like variable names, functions, and syntactic structures, enabling efficient code generation and transformation. They serve as keys for method dispatch in dynamic systems and are fundamental in macro definitions, where symbols allow unevaluated code fragments to be quoted and manipulated as data. This capability supports the development of domain-specific languages (DSLs) by treating code as manipulable lists or trees, with symbols ensuring consistent reference to identifiers across expansions.[7] In reflection, symbols enable runtime introspection by serving as keys to access object properties and metadata. For example, in JavaScript, symbols can be used to inspect an object's properties hidden behind standard enumeration, providing targeted access in frameworks to reduce repetitive code for runtime analysis.[3] Compared to strings, symbols provide advantages in type safety, as they form a distinct primitive type that prevents misuse of arbitrary text in code-handling operations, and their interning promotes efficient equality checks via pointer comparison rather than content scanning. This uniqueness mitigates risks in dynamic dispatch by ensuring only predefined or trusted symbols are used, avoiding potential injection from unvetted inputs in eval-like contexts. Additionally, symbols' immutability supports reliable caching in metaprogramming pipelines, enhancing performance without the overhead of string hashing.[3][25]In Symbolic AI and Logic Programming
In symbolic artificial intelligence (AI), symbols function as the core building blocks for constructing knowledge bases, where they represent abstract objects, concepts, and relationships in a discrete, manipulable form. This enables the creation of rules and inference engines that perform logical reasoning without relying on numerical computations. The physical symbol systems hypothesis, articulated by Allen Newell and Herbert A. Simon, underscores this foundation, arguing that intelligent behavior emerges from the syntactic manipulation of such symbols according to formal rules. In expert systems, symbols specifically denote facts, predicates, and production rules; for instance, a symbol like infection might represent a medical condition, allowing the inference engine to chain rules for diagnosis and treatment recommendations, as seen in early systems that encoded domain-specific expertise. In logic programming, symbols play a pivotal role as functor names and constants within terms, facilitating unification—the process of matching patterns to achieve variable substitutions—and backtracking search to resolve queries against a knowledge base. Constants, such as atoms like alice, unify only with identical instances, while functors (e.g., parent(alice, bob)) enable structured representation of relations, allowing the system to explore proof trees declaratively. This symbolic machinery supports automated reasoning by treating programs as logical statements, where success depends on finding substitutions that satisfy all clauses.[26] Historically, symbols enabled early applications in theorem provers, such as the Logic Theorist developed by Newell and Simon in 1956, which manipulated symbolic logical expressions to derive mathematical proofs through heuristic search. In natural language processing, symbolic approaches in the 1960s and 1970s, exemplified by systems like ELIZA and SHRDLU, used symbols for pattern matching and semantic representation—treating sentences as symbolic structures to simulate understanding in constrained domains like dialogue or block manipulation—without probabilistic models.[27] In modern declarative paradigms, symbols retain relevance by underpinning query languages and constraint solvers, where they abstract complex data relations into logical expressions for efficient retrieval and optimization. For example, in languages like Datalog or extensions such as Rel, symbols denote relations and entities, enabling scalable inference over large datasets in areas like knowledge graphs and database querying, thus bridging traditional symbolic methods with contemporary data-intensive applications.Language Support
Lisp Family
In the Lisp family of programming languages, symbols serve as a foundational data type originating from the earliest implementations of Lisp, representing named entities that can denote variables, functions, or literal data. In core dialects such as Common Lisp and Scheme, symbols are unique objects identified by their print names, with interning ensuring that symbols with identical names are eq? (or eq in Common Lisp) to the same object, promoting efficient equality checks and storage.[28][29] Common Lisp extends this with packages as namespaces to organize symbols and avoid name conflicts; for instance, the functionintern creates or retrieves a symbol within a specified package, such as (intern "FOO" :package) which interns the symbol FOO in the given package if it does not already exist. Scheme, per the R7RS standard, lacks built-in packages but maintains symbol uniqueness by name across the environment, with no standard mechanism for namespaces, emphasizing simplicity in its minimalistic design.[29]
A key feature in Common Lisp is the property list (PLIST), an associated list attached to each symbol for storing metadata as key-value pairs, accessible via functions like get and putprop. For example, (putprop 'foo 'bar 'property) associates the value bar with the property property on the symbol foo, enabling symbols to carry arbitrary data without altering the core language semantics. The special form QUOTE prevents evaluation of its argument, allowing symbols to be treated as literal data rather than code; syntactically, 'foo is equivalent to (quote foo), returning the unevaluated symbol FOO, which is essential for constructing code programmatically.[30] In Scheme, quoting behaves similarly with quote, ensuring symbols like 'foo remain unevaluated data objects.[29]
Derivatives of Lisp introduce variations on these concepts. Clojure enhances symbols with namespaced keywords, which are self-quoting symbols prefixed by a colon (e.g., :user/foo) that function like enhanced symbols for use as map keys or identifiers, supporting auto-resolution in namespaces via double colons (e.g., ::bar resolves to :current-ns/bar).[31] These keywords combine symbol-like naming with immutability, making them ideal for data-oriented programming. Emacs Lisp, another derivative, uses an obarray—a vector-based hash table—for interning symbols into the global namespace; the function intern adds a symbol to the obarray if absent, as in (intern "foo") yielding the symbol foo, while make-symbol creates uninterned symbols not entered into any obarray.[17]
Illustrative examples highlight symbol operations across these dialects. In Common Lisp, symbol creation and equality are demonstrated as follows:
(intern "foo") ; Returns the interned symbol FOO
(eq 'a 'a) ; Returns T (identity equality for same symbol)
(eq 'a "a") ; Returns NIL (different types)
(intern "foo") ; Returns the interned symbol FOO
(eq 'a 'a) ; Returns T (identity equality for same symbol)
(eq 'a "a") ; Returns NIL (different types)
setf on the symbol-value: (setf (symbol-value 'var) 42) assigns 42 to the dynamic variable var. In Emacs Lisp, similar equality holds with eq, and interning ensures uniqueness within the obarray.[17]
Lisp's homoiconicity, where code is represented as data structures composed of symbols and lists, uniquely enables metaprogramming; for instance, a quoted form like '(defun add (x y) (+ x y)) is a list whose first element is the symbol defun, manipulable as data to generate executable code.[30] Additionally, in modern implementations like LispWorks for Common Lisp, unused symbols—those without references in values, functions, or property lists—can be garbage collected to reclaim memory, though interned symbols persist in packages until explicitly uninterned.[32]
Ruby
In Ruby, symbols are immutable objects that represent identifiers, typically denoted as strings prefixed with a colon, such as :name. They are interned in a global symbol table maintained by the interpreter, ensuring that all instances of the same symbol—whether created as literals, from strings, or internally by the language—refer to the exact same object in memory. This interning mechanism promotes efficiency in storage and comparison, as symbols can be compared by object identity rather than content. Symbols are inherently frozen, preventing modification after creation, which aligns with their role as stable keys or labels.[4] Symbols are created using literal syntax, like :sym, or programmatically via Symbol.new("string"), though literals are the idiomatic approach for known identifiers. To convert a string to a symbol, the to_sym (or intern) method is used, for example:":foo".to_sym # => :foo
":foo".to_sym # => :foo
hash = { :key => "value" }
hash[:key] # => "value"
hash = { :key => "value" }
hash[:key] # => "value"
Symbol.all_symbols.size # => number of symbols (e.g., thousands in a typical session)
Symbol.all_symbols.size # => number of symbols (e.g., thousands in a typical session)
obj.send(:to_s) # Equivalent to obj.to_s
obj.send(:to_s) # Equivalent to obj.to_s
Elixir and Erlang
In Erlang, atoms serve as pre-interned, unique constants identified by their name, functioning as lightweight identifiers for values such as states or tags.[35] They are created literally without a prefix if starting with a lowercase letter (e.g.,ok), or enclosed in single quotes for other cases (e.g., 'true' or 'phone number').[35] The system enforces a default limit of 1,048,576 atoms per virtual machine instance to prevent memory exhaustion, a constraint adjustable via the +t command-line option but typically kept conservative for stability in long-running systems. This limit indirectly supports distribution safety, as excessive dynamic atom creation across nodes could propagate resource strain in clustered environments.
Atoms in Erlang are integral to concurrency, particularly in process communication via messages and pattern matching. For instance, equality checks like ok == ok evaluate to true due to their interned nature, enabling efficient comparisons without string operations.[35] In receive blocks for handling inter-process messages, atoms pattern-match against incoming terms, such as:
receive
{pid, :ping} ->
pid ! :pong
end
receive
{pid, :ping} ->
pid ! :pong
end
:ping and :pong atoms tag the message for selective reception and response, facilitating reliable signaling in distributed processes. Similarly, in case statements, atoms enable branching on message contents or return values, like case Result of :ok -> success; :error -> failure end. Atoms also name modules and registered processes, ensuring unique references within the system.
Elixir builds on Erlang's atoms by prefixing them with a colon (e.g., :foo), treating them as symbolic constants for enumeration or error handling (e.g., {:ok, value} or {:error, reason}).[36] It extends creation with quoted forms supporting string interpolation for dynamic atoms, such as :"bar#{1}" yielding :bar1, though this is discouraged in production due to the lack of garbage collection for atoms.[36] Elixir also provides the ~w/1 sigil with an a modifier to generate lists of atoms from whitespace-separated words (e.g., ~w(ok error)a produces [:ok, :error]), but without direct interpolation in this form.[37]
In Elixir, atoms excel in pattern matching for control flow, mirroring Erlang but with more expressive syntax. For example, a case statement might destructure a result tuple:
case compute() do
{:ok, data} -> [process](/page/Process)(data)
{:error, msg} -> log(msg)
end
case compute() do
{:ok, data} -> [process](/page/Process)(data)
{:error, msg} -> log(msg)
end
send/2 and matched in receive blocks, akin to Erlang, promoting fault-tolerant concurrency.[36]
Within the BEAM virtual machine shared by Erlang and Elixir, atoms facilitate efficient term serialization for distribution across nodes using the External Term Format (ETF). During message passing, an atom is encoded by its index in the sender's atom table; if absent on the receiver, the atom string is transmitted to create it locally, ensuring consistency without full string copies.[38] This mechanism underscores atoms' role in scalable, fault-tolerant systems but heightens risks from dynamic creation, as unchecked inputs could exhaust the atom table (capped at around 1 million by default) and crash nodes.[39] Production deployments thus issue warnings for functions like String.to_atom/1, recommending predefined mappings or String.to_existing_atom/1 to mitigate denial-of-service vulnerabilities in distributed setups.[39]
Julia
In Julia, symbols are interned strings that serve as lightweight identifiers, primarily used in metaprogramming to represent variable names, function calls, and other syntactic elements without the overhead of full strings. They are created using the literal syntax:sym or the Symbol constructor, such as Symbol("var"), and are automatically interned in a global table to ensure uniqueness and efficient comparison via pointer equality. This interning mechanism promotes hygiene in macros by preventing unintended name captures, as each distinct symbol is a singleton object.[40][41]
Symbols play a central role in expression manipulation and abstract syntax tree (AST) construction, where they act as heads of Expr objects or as literal values. For instance, the quoted expression :(x + 1) parses to an Expr with head :call, arguments :+, :x, and 1, allowing programmatic modification before evaluation. They are also employed in building nested ASTs for dynamic code generation and serve as immutable keys in dictionaries, leveraging their hashing efficiency. Common operations include symbol interpolation, as in Symbol("a", 1) yielding :a1, and dynamic evaluation like eval(Meta.parse(":foo")), which returns the symbol :foo itself. These features enable concise manipulation of code as data, facilitating tasks in numerical and scientific computing such as generating optimized expressions for simulations or data analysis.[40]
A distinctive aspect of Julia's symbols is their seamless integration with multiple dispatch, where metaprogramming constructs can generate method definitions using symbols as function or argument names, allowing runtime specialization based on types while symbols provide hygienic identifiers. In domain-specific languages for mathematics, such as those in the Symbolics.jl ecosystem, symbols represent variables in symbolic expressions, supporting operations like differentiation and substitution within frameworks for modeling differential equations or optimization problems. Similarly, in plotting libraries like Plots.jl, symbols are used as keys for attributes—e.g., plot(x, y; seriestype=:scatter, xscale=:log10)—enabling extensible, declarative syntax for visualizations in scientific workflows.[40][42][43][44]
Prolog
In Prolog, atoms serve as fundamental symbols representing constants in declarative logic programming, essential for knowledge representation and automated reasoning. An atom is defined as a sequence of characters that forms a single, indivisible unit, typically starting with a lowercase letter followed by letters, digits, or underscores (e.g.,atom, likes), or enclosed in single quotes to include spaces, uppercase letters, or special characters (e.g., 'apple', '[Big Kahuna Burger](/page/Big_Kahuna_Burger)').[45] These atoms are used without inherent interning limits in the language specification, allowing flexible declaration in facts and rules to model domain knowledge.[45] For instance, a fact such as likes([alice](/page/Artificial_Linguistic_Internet_Computer_Entity), apple). declares a relationship using atoms as the functor likes and arguments alice and apple, forming a simple term for the knowledge base.[45]
Atoms play a central role in Prolog's unification mechanism, which underpins pattern matching and logical inference. As constants within terms, atoms unify only with identical atoms, enabling precise matching during query resolution (e.g., mia = mia succeeds, while mia = vincent fails).[26] In compound terms, atoms function as functors or arguments; for example, the rule happy(X) :- listens2Music(X), playsAirGuitar(X). uses the atom happy as a predicate functor and variables like X contrasted against atomic constants for instantiation. Queries exemplify this distinction: ?- likes(X, food). unifies X with alice if the fact likes(alice, apple). exists, binding the variable while treating atoms as fixed patterns, or ?- likes(jody, Y). instantiates Y to surfing from a matching fact.[26] This variable-atom handling supports non-deterministic search, where unification drives backtracking to find all solutions.
Unlike Lisp's global registry of symbols with attached properties for metaprogramming, Prolog provides no such accessible structure; atoms are treated as opaque constants emphasizing term isomorphism—structural equivalence of terms—for SLD resolution in theorem proving.[46] This design prioritizes efficient pattern matching in logic resolution over symbol manipulation, aligning with Prolog's roots in automated deduction.[46]
Smalltalk
In Smalltalk, symbols primarily function as method selectors, which are unique, interned string objects prefixed with a hash mark (e.g.,#methodName) and used for efficient lookup within a class's method dictionary during message dispatch. When an object receives a message, the runtime system searches the receiver's class (and superclasses) for a method matching the symbol selector; if found, the corresponding compiled method executes. This design ensures that selectors are immutable and globally unique, promoting fast hashing and comparison for dynamic method invocation.[47][48]
Symbols are implemented via a global symbol table that interns strings, guaranteeing a single instance per unique selector to optimize memory and performance in the object-oriented environment. Creation happens automatically as literals in code (e.g., #greet) during compilation, or dynamically through class methods such as Symbol selector: 'greet', which checks the table and returns or creates the interned instance. The table is maintained by the virtual machine or runtime, with garbage collection periodically rebuilding it to remove unused symbols while preserving system consistency.[49]
Common examples include dynamic message sending, such as anObject perform: #greet with: 'world', which invokes the method named by the symbol on the object with the provided argument. For inspection, classes provide allSelectors, returning a set of Symbol instances listing all implemented selectors across the class hierarchy, useful for browsing or analysis. Symbols also integrate with pragmas for method metadata, as in <return: #true>, where the symbol annotates behavior for tools or runtime checks, and appear in blocks for reflective patterns like method wrappers.[50]
A distinctive aspect of Smalltalk's everything-is-an-object model is how symbols enable runtime reflection, allowing developers to add or modify methods dynamically—for instance, via aClass compile: 'greet ^[self](/page/Self) printString' classified: 'accessing'—using the symbol implicitly derived from the source to update the method dictionary on the fly. This supports advanced metaprogramming by treating selectors as manipulable objects within the live system.[48]
PostScript
In PostScript, a page description language, names function as atomic symbols that serve as identifiers for various elements in the program's execution. These names are uniquely defined by sequences of characters and are prefixed with a slash (/) when written in source code, such as /font, to denote them explicitly as keys in dictionaries or as references to operators and variables.[16] They enable efficient storage and retrieval of page description components, including fonts, colors, and transformations, within the stack-oriented environment of the interpreter.[16] The PostScript interpreter maintains an internal name table where all encountered names are interned upon parsing, allowing for rapid lookup and comparison during rendering without redundant string processing. This interning mechanism ensures that identical names share the same object reference, optimizing memory usage and execution speed on resource-constrained printing devices. Executable names, once defined as operators or procedures, can be invoked directly from the stack or operand context, triggering associated actions like graphics operations.[16] For example, a name can be defined as a dictionary key by placing a dictionary on the stack and using the put operator: with an empty dictionary created via 1 dict, the sequence /key (value) put stores the string "value" under the atomic key /key. To execute a predefined name like /font, assuming it references a font dictionary on the stack, the interpreter processes font selectfont to select that font for subsequent text rendering. If an unknown name is encountered as an executable operator, the interpreter raises an UNDEFINED error, halting execution and reporting the offending name to prevent invalid operations during page output.[16] PostScript, designed by Adobe Systems in 1982 specifically for high-quality printing and rasterization, leverages these name objects to optimize the serialized transmission of programs to output devices, where compact representation and shared interning minimize data volume without allowing runtime modification of the names themselves to maintain consistency in device-independent rendering.[51][52] This approach influenced the adoption of similar symbol-like mechanisms in subsequent graphics and scripting languages.Other Languages
In Scala 2, symbols were provided through thescala.Symbol class, which created interned, unique objects from equal strings, enabling efficient comparison via reference equality and supporting metaprogramming tasks like reflection. However, in Scala 3 (released in 2021), symbol literals are no longer supported, and the scala.Symbol class is deprecated; strings or other types are now used for similar purposes.[53] JavaScript introduced Symbols in ECMAScript 6 as a primitive type for creating unique, immutable identifiers, primarily used as non-enumerable object keys to avoid property name collisions; they can be created uniquely via the Symbol() constructor or shared via interning in a global registry with Symbol.for().[3]
Python lacks native symbols but approximates them using Enum members for named constants or frozenset for hashable, immutable collections that mimic unique identifiers in certain contexts like keys. Similarly, R employs named lists where attribute names function as symbolic labels for accessing elements, providing a lightweight equivalent for data structuring without dedicated symbol interning.[54]
In emerging systems languages, Rust uses static string slices (&'static str) for compile-time constant identifiers or macro hygiene to generate unique names, offering symbol-like behavior in metaprogramming without a primitive type.[55] Go, conversely, has no native symbols and relies on constant strings (const declarations) for similar purposes, emphasizing simplicity in its type system.[56]
Mainstream languages like C++ and Java omit dedicated symbols, depending on std::string or java.lang.[String](/page/String) for identifier roles due to their focus on general-purpose string handling via hashing and equality checks, which suffice for most use cases without the overhead of interning. Future trends in WebAssembly interoperability, particularly through the Component Model, may enable symbol-like type sharing across languages by defining precise interfaces for exported functions and data, enhancing cross-language composition.[57]