Recent from talks
Contribute something
Nothing was collected or created yet.
MultiValue database
View on WikipediaThis article needs additional citations for verification. (April 2018) |
A MultiValue database is a type of NoSQL and multidimensional database. It is typically considered synonymous with PICK, a database originally developed as the Pick operating system.
MultiValue databases include commercial products from Rocket Software, Revelation, InterSystems, Northgate Information Solutions, ONgroup,[1] and other companies. These databases differ from a relational database in that they have features that support and encourage the use of attributes which can take a list of values, rather than all attributes being single-valued. They are often categorized with MUMPS within the category of post-relational databases, although the data model actually pre-dates the relational model. Unlike SQL-DBMS tools, most MultiValue databases can be accessed both with or without SQL.
History
[edit]Don Nelson designed the MultiValue data model in the early to mid-1960s.[2] Dick Pick, a developer at TRW, worked on the first implementation of this model for the US Army in 1965. Pick considered the software to be in the public domain because it was written for the military, this was but the first dispute regarding MultiValue databases that was addressed by the courts.[3]
Ken Simms wrote DataBASIC, sometimes known as S-BASIC, in the mid-1970s. It was based on Dartmouth BASIC, but had enhanced features for data management. Simms played a lot of Star Trek (a text-based early computer game originally written in Dartmouth BASIC) while developing the language, to ensure that DataBASIC functioned to his satisfaction.[4]
Three of the implementations of MultiValue - PICK version R77, Microdata Reality[5] 3.x, and Prime Information 1.0 - were very similar. In spite of attempts to standardize, particularly by International Spectrum and the Spectrum Manufacturers Association, who designed a logo for all to use,[6] there are no standards across MultiValue implementations. Subsequently, these flavors diverged, although with some cross-over. These streams of MultiValue database development could be classified as one stemming from PICK R83, one from Microdata Reality, and one from Prime Information.[7] Because of the differences, some implementations have provisions for supporting several flavors of the languages. An attempt to document the similarities and differences can be found at the Post-Relational Database Reference (PRDB).[8]
One reasonable hypothesis for this data model lasting 50 years,[9] with new database implementations of the model even in the 21st century is that it provides inexpensive database solutions.
Data model example
[edit]In a MultiValue database system:
- a database or schema is called an "account"
- a table or collection is called a "file"
- a column or field is called a field or an "attribute", which is composed of "multi-value attributes" and "sub-value attributes" to store multiple values in the same attribute.
- a row or document is called a "record" or "item"
Data is stored using two separate files: a "file" to store raw data and a "dictionary" to store the format for displaying the raw data.
For example, assume there's a file (table) called "PERSON". In this file, there is an attribute called "eMailAddress". The eMailAddress field can store a variable number of email address values in a single record. The list [joe@example.com, jdb@example.net, joe_bacde@example.org] can be stored and accessed via a single query when accessing the associated record.
Achieving the same (one-to-many) relationship within a traditional relational database system would include creating an additional table to store the variable number of email addresses associated with a single "PERSON" record. However, modern relational database systems support this multi-value data model too. For example, in PostgreSQL, a column can be an array of any base type.
MultiValue Basic Language
[edit]Multivalue Basic (now commonly styled as mvBasic) is a family of programming languages more or less common (and portable) to all the multivalue databases derived from the original Pick Operating System. The variations between implementations are known as flavours.
The language originates from Dartmouth Basic and the earliest implementation of PickBASIC (now D3 FlashBasic). Over time various customisations and extensions have been added to take advantage of capabilities added to the different flavours while staying mainly in sync.
mvBasic statements and functions are designed to access and take advantage of the multivalue database model and providing the usual capabilities of most modern languages. For example, cryptography and communications. mvBasic is typeless and lends itself to structured programming techniques.
Example code is available but limited. Whilst there are commercial applications and tools available, the multivalue database community has not embraced the open source library/package model to the degree seen with other languages.
The typical mvBasic compiler compiles program source to a P-code executable object and runs in an interpreter, with D3 FlashBasic[10] and jBASE[11] being notable exceptions.
MultiValue Query Language
[edit]Known as ENGLISH, ACCESS, AQL, UniQuery, Retrieve, CMQL, and by many other names over the years, corresponding to the different MultiValue implementations, the MultiValue query language differs from SQL in several respects. Each query is issued against a single dictionary within the schema, which could be understood as a virtual file or a portal to the database through which to view the data.
- LIST PEOPLE LAST_NAME FIRST_NAME EMAIL_ADDRESSES WITH LAST_NAME LIKE "Van..."
The above statement would list all e-mail addresses for each person whose last name starts with "Van". A single entry would be output for each person, with multiple lines showing the multiple e-mail addresses (without repeating other data about the person).
See also
[edit]References
[edit]- ^ "ONgroup". www.ongroup.com.
- ^ Nelson, Don (1965). "General Information Retrieval Language and System (GIRLS)" (PDF).
{{cite journal}}: Cite journal requires|journal=(help) - ^ "Microdata Alumni". www.microdata-alumni.org.
- ^ Sisk, Jonathan (1987). PICK BASIC: A Programmer's Guide. Tab Books.
- ^ "Home". www.northgate-is.com.
- ^ "MultiValue Symbol".
- ^ Wolthuis, Dawn (2002). "MultiValue Family Tree" (PDF).
{{cite journal}}: Cite journal requires|journal=(help) - ^ "Post-Relational Database Reference".
- ^ Nelson, Don (1964). "Generalized Information Retrieval Language and System (GIRLS)" (PDF).
{{cite journal}}: Cite journal requires|journal=(help) - ^ "Introduction to D3".
- ^ "jBase Compilation".
External links
[edit]- DB-Engines Ranking of Multivalue DBMS by popularity, updated monthly
MultiValue database
View on GrokipediaOverview
Definition and Core Principles
A MultiValue database is a type of NoSQL database that supports multidimensional data storage, enabling the handling of multi-valued attributes within records.[4] It extends the relational model into a post-relational or non-first normal form (NF2) structure, where fields can contain multiple values and nested elements without adhering to strict atomicity requirements.[6] Originally synonymous with the PICK system, this approach treats data as inherently multidimensional, accommodating complex relationships in a single record rather than across separate tables.[1] Core principles of MultiValue databases emphasize direct data access without the need for joins, leveraging hashed file organization to store and retrieve records efficiently. Records are identified by unique primary keys and organized in hash files, allowing rapid, key-based access to entire datasets without relational linking operations.[1] This hashed structure uses variable-length character strings delimited by special markers—such as field marks for attributes, value marks for multiple entries in a field, and subvalue marks for further nesting—facilitating the representation of hierarchical data within individual records.[6] Support for nested data structures is integral, permitting implicit relationships that mirror natural data hierarchies, such as orders containing multiple line items with sub-details, all stored contiguously.[1] MultiValue databases prioritize data flexibility by allowing attributes to hold multiple values natively, bypassing the normalization processes typical in relational systems. This design reduces the proliferation of tables and eliminates redundancy from splitting multi-valued data into separate entities, resulting in simpler schemas with fewer components to manage.[6] Consequently, queries operate on denormalized structures, enhancing performance for applications involving complex, repeating data patterns while maintaining compatibility with normalized views when needed.[1]Key Features and Terminology
MultiValue databases are distinguished by several core features that enable flexible data handling and efficient operations. One prominent feature is the use of typeless variables in their associated programming environments, where variables do not require explicit type declarations and can dynamically interpret data as strings, numbers, or other forms based on context, simplifying development for complex data manipulations.[7] Another key aspect is dictionary-driven data views, where a separate dictionary file accompanies each data file to define metadata such as field formats, conversions, and display rules, allowing dynamic presentation of raw data without altering the underlying storage.[8] For querying, MultiValue systems employ select lists—temporary or persistent lists of record identifiers—that function similarly to inverted indexes, facilitating rapid retrieval and processing of subsets of data across files without scanning entire datasets.[9] The hashed file structure is fundamental to MultiValue databases, organizing data into modular groups of blocks where records are placed using a hashing algorithm on the primary key, enabling direct access to specific records without relying on traditional secondary indexes for basic retrievals.[10] This approach supports scalability by allowing files to grow dynamically through overflow mechanisms and modulo arithmetic for key-to-group mapping, optimizing storage and access in environments with variable data volumes.[10] Essential terminology in MultiValue databases includes the following concepts: An account refers to the overall database instance or schema, encompassing all files and related resources accessible to a user or application.[1] A file is analogous to a table, serving as a container for related records with a defined structure via its dictionary.[11] A group denotes a single record within a file, identified by a unique key and containing one or more attributes.[1] An attribute is a field within a group that can hold multiple values, representing a multi-valued dimension of the data.[1] Value marks are delimiter characters (typically ASCII 253, represented as ^) used to separate multiple values within an attribute.[1] Sub-value marks (typically ASCII 252, represented as ]) further delimit sub-elements within individual values, enabling nested multi-valued structures.[1]Historical Development
Origins and Early Implementations
The MultiValue database concept originated in the early to mid-1960s when Don Nelson, a systems engineer at TRW Inc., designed a multidimensional data model to address complex inventory tracking needs for the U.S. Army.[12] Nelson's approach emphasized flexible, non-relational storage that could handle variable-length fields and associative relationships, initially conceptualized as a generalized information retrieval system to manage parts for military hardware like the Cheyenne helicopter during the Vietnam War era.[13] This design was driven by the Army's requirement for an English-like query language capable of processing large-scale, multi-valued data without rigid schemas, marking a departure from traditional hierarchical or flat-file systems of the time.[14] The first practical implementation came in 1965, led by Richard "Dick" Pick, a physicist and developer at TRW, who built upon Nelson's model to create the PICK operating system—originally dubbed the Generalized Information Retrieval Language System (GIRLS).[12] Running on an IBM System/360 mainframe, this system was deployed for the U.S. Army's inventory management of Cheyenne helicopter components, enabling efficient storage and retrieval of parts data across multiple users in a time-sharing environment.[14] Pick's innovation integrated the database directly into the operating system kernel, using hash-based file structures to support rapid access to multi-valued attributes, which proved effective for the Army's demanding logistics but was later renamed General Information Management (GIM) at the military's request due to the original acronym's informality.[13] Although the Cheyenne program was canceled in the late 1960s following a prototype crash, the core technology entered the public domain, paving the way for broader adoption.[15] In the 1970s, early enhancements focused on programming interfaces, with Ken Simms developing DataBASIC (also known as S-BASIC) as a key addition to the ecosystem.[16] Working at the University of California, Irvine, on a Xerox Sigma-7 implementation of the PICK system, Simms created this language in the mid-1970s, deriving it from Dartmouth BASIC while extending it with built-in commands for direct database manipulation, such as record locking and multi-value array handling.[15] DataBASIC compiled to p-code for portability across hardware, significantly improving developer productivity for inventory and data processing applications within the PICK environment.[16] Initial commercial releases emerged toward the late 1970s, with Pick R77 representing a stabilized version of the system licensed through partners like Microdata Corporation.[17] Released around 1977, R77 refined the core PICK OS for business use, incorporating Simms' DataBASIC and supporting multi-user operations on minicomputers, which facilitated its first widespread deployments beyond military contexts.[15] This version emphasized reliability for inventory management in sectors like manufacturing, setting the foundation for subsequent iterations without altering the underlying multidimensional principles.[17]Evolution and Standardization Efforts
The evolution of MultiValue databases from the 1980s onward diverged into several key streams, primarily stemming from the original Pick system. The traditional Pick R83 stream, developed by Pick Systems, built upon earlier versions like R77 and R80 to create a standardized reference implementation that emphasized portability and multi-user capabilities, serving as a benchmark for subsequent implementations.[18] Paralleling this, the Microdata Reality branch originated in 1973 as the first commercial MultiValue system, initially deployed on Microdata hardware for time-sharing applications supporting dozens to hundreds of users, and later enhanced through acquisitions and independent development.[19] The Prime Information stream emerged in the early 1980s on Prime Computer's Primos operating system, focusing on emulation and integration with hardware-specific environments, which later influenced products like UniVerse and UniData.[20] These streams reflected adaptations to diverse hardware and vendor needs, with Pick technology licensed to companies such as Prime, Ultimate, and NCR, fostering widespread but fragmented adoption.[19] Major developments in the 1980s included the porting of MultiValue systems to Unix platforms, driven by the industry's shift toward open operating systems. As Unix gained prominence, vendors like Microdata (later MDIS under McDonnell Douglas) and others ported Reality and Pick variants from proprietary OSes like Primos to Unix, enabling broader interoperability and deployment on minicomputers and workstations.[1] This era also saw the emergence of Pick Systems Inc. (formerly Pick Computer Company) as a central vendor, which commercialized the Pick OS and database, positioning it as a competitor to Unix for business data processing.[15] These ports addressed scalability issues in legacy environments, allowing MultiValue databases to support virtual memory and time-sharing for enterprise applications without full reliance on custom hardware.[20] Standardization efforts in the 1990s aimed to unify these divergent implementations but ultimately resulted in persistent "flavors" due to vendor-specific extensions. The Spectrum Manufacturing Association, formed in 1985, sought to align systems like Prime Information with the R83 reference by referencing the Pick Pocket Guide, promoting consistency in core features such as the BASIC programming language and ENQUIRY query tools—evolved from earlier systems like GIRLS for user-friendly data access predating SQL.[15] Further attempts, including explorations of ISO standards for database languages, faltered amid competing priorities, as vendors prioritized proprietary enhancements over universal compliance, leading to interoperability challenges across R83, Reality, and Prime derivatives.[20] Despite these initiatives, no comprehensive standard emerged, preserving a landscape of specialized implementations. By the 1990s, MultiValue databases transitioned to PC platforms, expanding accessibility beyond mainframes and minicomputers. Pick R83 became the first full-featured MultiValue system to run natively on PCs, delivering mainframe-level DBMS capabilities for smaller-scale deployments and supporting the rise of client-server architectures.[18] This shift facilitated GUI integrations, web connectivity, and SQL bridges, aligning MultiValue with emerging open systems. In the early 2000s, initial explorations into cloud-like environments involved virtualization techniques, allowing MultiValue engines to operate as applications on host OSes like Windows and Linux, paving the way for hosted and distributed deployments without native hardware dependencies.[15]Data Model
Structure and Components
MultiValue databases organize data within a hierarchical architecture centered on accounts, which serve as logical containers for user-specific environments and data sets. Each account encompasses multiple files, where a file represents a hashed collection of groups, also known as records or items, each identified by a unique record ID such as a key or identifier. This structure allows for efficient isolation of data per user or application context, with the master dictionary—often abbreviated as MD or VOC—residing at the account level to provide pointers to all files and commands accessible within that account.[11][21] At the file level, two primary components form the core: the data file, which stores the physical records containing the actual application data, and the associated dictionary file, which holds metadata defining the structure, views, and calculations for interpreting the data. The dictionary file, itself a hashed file, includes entries that describe attribute positions, conversion codes, and display formats, enabling flexible data presentation without altering the underlying records. For instance, a file dictionary might specify how attributes are numbered and formatted for queries or reports. The master dictionary complements this by maintaining an index of file locations across the account, facilitating navigation and access control.[11][8][22] Records within the data file are stored as dynamic arrays, represented as variable-length strings delimited by special characters to separate components. Attribute marks, typically ASCII character 254 (denoted as ^ or \xFE), function as field separators to delineate individual attributes—such as attribute 1 for the primary key, attribute 2 for a name field, and so on—allowing records to expand horizontally without fixed schemas. This delimited format supports rapid parsing and manipulation in memory during processing. Value marks (ASCII 253, denoted as ]) and subvalue marks (ASCII 252, denoted as ) further enable nested structures within attributes, though their primary role here is to maintain the array-like organization of the record.[1][23] The underlying file system in MultiValue databases relies on dynamic hashing for storage and retrieval, where record IDs are hashed to compute direct block addresses on disk, minimizing seek times and enabling near-constant access performance even for large datasets. This hashed organization, often implemented as a modular file structure with overflow handling, ensures scalability and supports the variable-length nature of records without fragmentation issues common in fixed-schema systems. Multiple data files can share a single dictionary via multifile configurations, promoting reuse and consistency across related datasets.[24][10][25]Handling Multi-Valued Data
MultiValue databases manage multi-valued data by permitting individual attributes within a record to store multiple discrete values, separated by a dedicated delimiter called the value mark, which corresponds to ASCII character 253 (CHAR(253)). This design allows fields to hold arrays or lists of related information natively, such as several phone numbers or addresses associated with a single entity, without the constraints of strict normalization found in relational models.[1][7] To accommodate even greater complexity, these systems support sub-valued data through a nested delimiter known as the subvalue mark, ASCII character 252 (CHAR(252)), which divides components within each multi-value entry. This enables hierarchical organization, where each value can itself contain multiple sub-elements, such as pairing a contact type with its corresponding detail. For example, in a PERSON file record, the fifth attribute might contain email data structured aspersonal\[email protected]]work\[email protected], with the value mark (]) separating distinct email entries and the subvalue mark () distinguishing the type from the address in each.[1][26]
Such handling promotes denormalized storage of interconnected data, minimizing the reliance on cross-referencing multiple records or files for common scenarios like an order containing several items. In this case, product codes, quantities, and associated details (e.g., serial numbers as subvalues) can all reside in aligned multi-valued attributes within one record, streamlining access to the full dataset.[1][27]
Development and Query Tools
MultiValue Basic Language
The MultiValue Basic Language, originally developed as PickBASIC in the mid-1960s, serves as the primary procedural programming language for building applications in MultiValue database systems. It is a typeless dialect derived from standard BASIC syntax, adapted to handle the unique multi-valued data structures of the Pick system while supporting business-oriented data processing tasks. This language originated alongside the Pick operating system, created by Dick Pick to manage complex inventory and record-keeping needs, evolving from early implementations like the Generalized Information Retrieval Language System (GIRLS) in 1965 into a full-fledged programming environment by the 1970s.[12][28] Key features of MultiValue Basic emphasize structured programming capabilities, including subroutines for modular code organization, loops (such as FOR...NEXT and WHILE...LOOP) for iteration, and conditional statements (IF...THEN...ELSE) for decision-making, which build on BASIC's simplicity while adding robustness for enterprise applications. The language excels in dynamic array manipulation, where variables can function as delimited strings treated as three-dimensional structures—using field marks (^), value marks (char(253)), and subvalue marks (char(252))—to natively process multi-valued fields without explicit schema definitions.[29] For portability across hardware and implementations, source code compiles to P-code, an intermediate bytecode interpreted at runtime, ensuring consistent execution in diverse MultiValue environments.[28][30][28] Core syntax elements focus on seamless database integration and data operations. The READ statement retrieves individual records from hashed files, assigning them to variables or arrays, as inREAD RECORD FROM FILE.VAR, ID ELSE status = 0. Array operations leverage MAT commands for efficient bulk handling, such as MATREAD ARRAY FROM FILE.VAR, ID ELSE PRINT "Record not found", which populates a dynamic array with an entire record, or MATWRITE ARRAY TO FILE.VAR, ID for storage. Output is managed via the PRINT statement (often abbreviated as PRNT in documentation), supporting formatted display with conversion codes, for example: PRINT "Customer ID: ": ID: ", Name: " : CUST.NAME<1>. These constructs enable direct file access, parsing of delimited data, and report generation without intermediate layers.[7]
Implementations of MultiValue Basic vary to accommodate different vendors and performance needs. UniVerse BASIC, developed by Rocket Software, enhances the original PickBASIC with extended intrinsic functions for string processing and error handling, while maintaining backward compatibility for legacy code. In contrast, D3 BASIC from the same vendor introduces FlashBASIC, a variant that compiles to native machine code for improved speed over traditional P-code interpretation. Both dialects include modern extensions, such as support for external procedure calls via the CALL statement or OS.SERVICE, allowing integration with ODBC drivers to query relational databases from within MultiValue applications. These variations ensure adaptability while preserving the language's core focus on MultiValue data manipulation.[31][32][33]
MultiValue Query Language
The MultiValue query language, known variously as ENGLISH, ACCESS, or Select/Basic across implementations, is a declarative tool designed for ad-hoc querying and reporting in a natural-language-like syntax that leverages dictionary files to simplify data retrieval from MultiValue databases.[34] It enables users to specify desired output attributes, filters, and formats without procedural programming, relying on dictionary definitions to interpret and compute field representations. This approach supports intuitive queries by treating dictionary items as blueprints for virtual attributes, allowing non-technical users to generate reports as if conversing with the database.[34] At its core, the language uses dictionary files—special records that define how data attributes are accessed, converted, and presented—to create virtual attributes and handle complex data manipulations. Each dictionary item, often an "A-item," specifies an attribute's location within data records (via attribute mark counts), along with tags for labeling, conversions for formatting (such as date transformations or decimal masking), and correlatives for computed fields like sums or multiplications. For selections, queries employ clauses that reference these dictionary elements to filter records; for instance, a query likeLIST [PERSON](/page/Person) WITH [SURNAME](/page/Surname) "Van*" would scan the dictionary's SURNAME definition to match records starting with "Van," displaying only relevant items.[34] Multi-valued data is managed through value marks (separating multiple instances of an attribute) and sub-value marks (for nested values), with dictionary correlatives enabling aggregations or extractions from these structures.[34]
Key commands include LIST for displaying filtered records, SORT for ordering output, and SELECT for generating lists of matching item-ids that can be reused in subsequent operations. The WITH clause applies simple equality or range filters to single-valued or every instance of multi-valued fields (e.g., LIST ACCOUNT WITH BALANCE > 1000), while correlative expressions in WHEN clauses handle conditional logic on multi-values (e.g., SELECT ORDERS WITH QTY > 10 WHEN STATUS = "ACTIVE" to filter orders where quantity exceeds 10 only if the associated status is active).[34] Formatting is dictionary-driven, automatically applying justifications, widths, and conversions to produce readable columnar reports. For example, executing LIST CUSTOMERS NAME BALANCE WITH STATUS = "ACTIVE" might yield:
NAME BALANCE
JOHN 1500.00
JANE 2500.00
NAME BALANCE
JOHN 1500.00
JANE 2500.00
LIST ACCOUNT NAME [CURR-BALNC](/page/Report) WITH EVERY TRNS-DATE BEFORE "3/18/70", would list accounts with transaction dates prior to the specified cutoff, extracting and correlating relevant multi-valued entries into a cohesive report.[34] These mechanisms emphasize efficiency in handling the non-relational, multi-dimensional nature of MultiValue data without requiring joins or explicit schema navigation.[34]
