OrientDB
View on WikipediaThis article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
| Developer | OrientDB Ltd |
|---|---|
| Initial release | 2010 |
| Stable release | 3.2.44
/ September 4, 2025[1] |
| Repository | |
| Written in | Java |
| Platform | Java SE |
| Type | Document-oriented database, Graph database, Multi-model database |
| License | Apache 2 License |
| Website | orientdb |
OrientDB is an open source NoSQL database management system written in Java. It is a Multi-model database, supporting graph, document and object models,[2] the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes. It has a strong security profiling system based on users and roles and supports querying with Gremlin along with SQL extended for graph traversal. OrientDB uses several indexing mechanisms based on B-tree and Extendible hashing, the last one is known as "hash index". Each record has Surrogate key which indicates the position of the record on disk. Links between records (edges) are stored either as the record's position stored directly inside of the referrer or as B-tree of record positions (so-called record IDs or RIDs), that serves as a container of RIDs, which allows fast traversal (with O(1) complexity) of one-to-many relationships and fast addition/removal of new links. OrientDB is the 6th most popular graph database according to the DB-Engines graph database ranking,[3] as of January 2024.
The development of OrientDB relies on an open-source community. The project uses GitHub[4] to manage the sources, contributors and versioning.
Engine
[edit]OrientDB is built with a multi-model graph/document engine. While OrientDB includes a SQL layer, the support for edges effectively means that these may be used to traverse relationships rather than employing a JOIN statement.[5] OrientDB handles every record / document as an object and the linking between objects / documents is implemented using direct pointers to the record's positions on disk. This leads to quick retrieval of related data as compared to joins in an RDBMS.[6]
Editions & licenses
[edit]OrientDB Community Edition is free for any use under the Apache 2 license.[citation needed] There is no commercial version of OrientDB.
Applications
[edit]- Banking
- Big Data
- Fraud prevention[7]
- Loan management software (Floify)[8][self-published source]
- Master data management[9][10]
- Non-coding RNA human interaction database[11]
- Recommendation engines[12][self-published source]
- Social networking
- Traffic management systems[13]
History
[edit]OrientDB was originally authored by Luca Garulli in 2010. Luca wrote it as a Java rewrite of the fast persistent layer of Orient ODBMS database (originally developed by Luca Garulli in 1999 in C++). During 2012–2014 years storage engine was redeveloped by Andrii Lomakin. It has got a new name "plocal" which stands for "paginated local". This name implies that the new storage engine is based on the concept of the splitting of data files by pages and page is treated as a single atomic unit of change. Since 2012, the project is being sponsored by OrientDB LTD (former Orient Technologies LTD), a for-profit company with Luca as its CEO and founder. In 2013 Andrii Lomakin has joined the company as R&D lead engineer and company's co-owner.[14][15]
The first time the word "multi-model" has been associated to the databases was on May 30, 2012, in Cologne, Germany, during Luca Garulli's keynote "NoSQL Adoption – What’s the Next Step?".[16][17] Luca Garulli envisioned the evolution of the 1st generation NoSQL products into new products with more features able to be used by multiple use cases. OrientDB was the first product to embrace documents, graphs, key-value, geospatial and reactive models in the same product, at the core level. This means that the multiple models were integrated into the core without using layers. For this reason, OrientDB is a "Native" Multi-model database.
OrientDB has been covered by media outlets and is the winner of the 2015 InfoWorld Bossie award.[18]
On September 15, 2017, OrientDB LTD company has been acquired by CallidusCloud.[19]
On January 30, 2018, it was announced SAP (company) acquired CallidusCloud for $2.4 billion.[20] and therefore OrientDB is now supported by SAP (company).
On September 1, 2021,[21][self-published source] the original founder Luca Garulli left SAP (company) and created a new database project ArcadeDB with a similar data presentation model after SAP decided to stop providing commercial support for OrientDB.
On December 30, 2024,[22][self-published source] author of OrientDB storage engine Andrii Lomakin has created YouTrackDB, a fork of OrientDB oriented on manipulation of the graph data and support of object-oriented concepts for enterprise application developers.
See also
[edit]References
[edit]- ^ "OrientDB 3.2 Release Notes". GitHub. Retrieved 29 October 2024.
- ^ "Multi-Model Database - OrientDB Manual". Archived from the original on 2015-05-03. Retrieved 2015-05-31.
- ^ "DB-Engines Ranking - popularity ranking of graph DBMS".
- ^ "orientechnologies/orientdb". GitHub. 30 May 2020.
- ^ Ltd., Bloor Research International (6 August 2014). "Diaku: more than governance - Bloor".
- ^ "Hidden Gems of Web / Mobile Development from Open-Source". Archived from the original on 2016-10-13.
- ^ "Harness graphs & documents for Real-time Fraud Prevention". Archived from the original on 2016-08-19. Retrieved 2016-07-15.
- ^ Sims, Dave (2015-03-05). "Why I Use OrientDB on Production Application". DZone Database.
- ^ Nuix. "Nuix 7 Conquers Customer Challenges for Today and Builds Hyper-Scale Capacity for the Future" (Press release).
- ^ "Diaku Axon - Data Governance powered by OrientDB". Archived from the original on 2017-07-28. Retrieved 2016-07-15.
- ^ Bonnici, V; Russo, F; Bombieri, N; Pulvirenti, A; Giugno, R (2014). "Comprehensive reconstruction and visualization of non-coding regulatory networks in human". Front Bioeng Biotechnol. 2: 69. doi:10.3389/fbioe.2014.00069. PMC 4261811. PMID 25540777.
- ^ "MovieLens recommendation engine with OrientDB - Pizza Connections". Archived from the original on 2017-07-26. Retrieved 2016-07-15.
- ^ "Traffic Management Systems with OrientDB". Archived from the original on 2017-07-15. Retrieved 2016-07-15.
- ^ "Expert Interview with Luca Garulli Of OrientDB On Multi-Model Database Management For Big Data". 18 May 2015. Archived from the original on 22 May 2015. Retrieved 15 July 2016.
- ^ admin. "Intervista a Luca Garulli – JavaStaff.com".
- ^ "Multi-Model storage 1/2 one product". Slideshare. 2012-06-01.
- ^ "Nosql Matters Conference 2012 | NoSQL Matters CGN 2012" (PDF). 2012.nosql-matters.org. Archived from the original (PDF) on 2018-04-13. Retrieved 2017-01-12.
- ^ staff, InfoWorld (16 September 2015). "Bossie Awards 2015: The best open source application development tools". InfoWorld.
- ^ "CallidusCloud Acquires Leading Multi-Model Database Technology" (Press release). 2017-09-19. Retrieved 2017-10-11.
- ^ "SAP snags CallidusCloud for $2.4 billion". TechCrunch. January 30, 2018. Retrieved January 30, 2018.
- ^ "Welcome to ArcadeDB". September 1, 2021. Retrieved September 1, 2021.
- ^ "Long road ahead". December 30, 2024. Retrieved December 30, 2024.
External links
[edit]OrientDB
View on GrokipediaOverview
Definition and Core Capabilities
OrientDB is an open-source NoSQL database management system (DBMS) written in Java, designed to support multiple data models within a single, unified engine.[2][1] This multi-model architecture allows it to function as a versatile operational database, combining the strengths of various NoSQL paradigms without requiring separate systems for different data types.[10] At its core, OrientDB excels in handling graph traversals through physical links between records, which eliminate the performance overhead of traditional SQL joins by enabling direct, constant-time (O(1)) relationships.[2] It also supports document storage for flexible, schema-less data management, key-value operations via efficient indexing for rapid lookups, and object-oriented persistence that maps database records directly to programming language objects.[10][11] These capabilities allow for millisecond-scale traversals across complex trees and graphs, optimizing resource use regardless of dataset size.[2] The system's design emphasizes versatility in managing diverse data structures, from interconnected networks to hierarchical documents, while providing scalability to handle large-scale datasets through distributed configurations.[2] This makes OrientDB particularly suitable for modern applications demanding high-performance data processing, such as real-time analytics and dynamic content management.[12]Key Features and Advantages
OrientDB provides ACID-compliant transactions across all supported data models, ensuring atomicity, consistency, isolation, and durability for operations in graph, document, key-value, and object-oriented contexts.[13] This compliance is maintained through an internal transaction tracking mechanism that supports both optimistic and pessimistic locking strategies, allowing reliable handling of complex, multi-record updates without data corruption risks.[13] A standout capability is its high-performance graph traversal, enabled by physical navigation via direct record identifiers (RIDs) that link related records on disk, avoiding the overhead of logical joins or index lookups in traditional databases.[14] A 2020 benchmark study on a 22.3 GB Twitter followers dataset using a three-node cluster showed that OrientDB completed depth-5 graph traversals in 1,721 seconds, compared to 15,079 seconds for Neo4j, demonstrating superior performance in extended traversals due to its native pointer-based approach.[15] OrientDB also offers schema flexibility through schema-full, schema-free, or hybrid modes, permitting strict field enforcement, completely dynamic structures, or mixed constraints within the same database to accommodate evolving data requirements.[16] Built-in support enhances usability with geospatial indexing powered by Lucene for efficient spatial queries following Open Geospatial Consortium standards, full-text search via Lucene for advanced text indexing and retrieval, and reactive queries through live query mechanisms that push real-time updates to applications without polling.[17][18][19] These features deliver key advantages, including reduced development time from multi-model support that eliminates the need for separate databases for different data types, cost-efficiency as an open-source solution under the Apache 2.0 license, and straightforward horizontal scaling via a zero-configuration multi-master architecture that distributes load across servers without manual sharding.[12][20]Architecture
Core Engine
OrientDB's core engine is implemented in Java, offering a flexible and extensible foundation for handling diverse data models within a single system.[4] This engine employs a pluggable storage architecture that accommodates various operational modes, including the PLocal paginated local storage for durable disk-based persistence, in-memory storage for low-latency access, and remote storage to facilitate distributed operations.[21][22] The PLocal engine, in particular, uses a page-based model with write-ahead logging (WAL) for atomic operations and crash recovery, replacing earlier memory-mapped approaches with custom caching for improved durability and performance.[21] At its core, the engine organizes all data as records, where entities like documents and vertices are persisted as binary records identifiable by a unique Record ID (RID), formatted as #clusterId:position for precise, direct retrieval without scanning.[23] Records support in-place updates and can split across pages if they exceed size limits, managed by configurable growth factors to optimize storage efficiency.[21] This record-centric design ensures consistent handling across storage modes, with RIDs remaining stable even in distributed environments through cluster locality assignment.[23] Transaction management in the core engine combines optimistic and pessimistic locking to address concurrency. Optimistic transactions apply Multi-Version Concurrency Control (MVCC), permitting multiple concurrent operations on records and resolving conflicts via version checks at commit, which enhances throughput in low-contention scenarios.[13] Pessimistic locking, available since version 3.1, allows explicit acquisition of locks on specific records or indices to block concurrent writes, suitable for high-contention use cases requiring guaranteed isolation.[24] These mechanisms integrate with the storage layer's WAL for durability, ensuring atomicity and recovery without per-commit synchronization overhead.[21] For efficient data access, the engine automatically creates indexes on schema-defined properties, primarily using the SB-Tree algorithm—a B-tree variant optimized for insertions, deletions, and range queries—or hash indexes for rapid equality lookups with minimal disk footprint.[25][26] SB-Tree indexes maintain sorted order and support null values, while hash indexes prioritize speed for exact matches but lack range support, both operating transactionally to align with the engine's concurrency model.[25][26] This indexing approach minimizes manual tuning while providing scalable lookup performance in single-node operations.Distribution and Scalability
OrientDB features a zero-config multi-master architecture that supports automatic replication and load balancing across multiple servers, enabling all nodes to handle both reads and writes without manual setup. This master-less design leverages Hazelcast for node discovery, synchronization, and cluster coordination, allowing seamless addition of servers to distribute workload horizontally.[20][27] Horizontal scaling in OrientDB is achieved through sharding via configurable partitions, referred to as clusters, where each class can span multiple clusters owned by specific servers. Applications manage shard selection, and since version 2.2, operations are balanced using round-robin distribution to optimize performance across nodes. This partitioned approach enables the system to manage large-scale datasets by incrementally adding servers and reassigning cluster ownership as needed.[27] Replication modes include synchronous and asynchronous options, configurable at the database level via theexecutionMode parameter in the distributed configuration file. In synchronous mode—the default—clients await confirmation from a quorum of nodes (e.g., majority, defined as N/2+1) to ensure consistency before responding. Asynchronous mode offers lower latency by executing operations locally and replicating in the background, with callbacks like onAsyncReplicationOk() available since version 2.1.6 for error handling. Conflict resolution employs a chain of strategies: majority vote among replicas, content comparison for equality, and highest version number, falling back to manual intervention if unresolved; custom resolvers are supported in the Enterprise Edition.[28][27]
High availability is ensured through automatic failover and quorum-based decision-making in clusters, eliminating single points of failure in the multi-master setup. During a node failure, the system evaluates quorum thresholds for ongoing transactions—if met, commits propagate to recovering nodes via Hazelcast synchronization; otherwise, rollbacks occur to maintain consistency. Replica servers, introduced in version 2.1, enhance read scalability and availability as read-only nodes without influencing write quorums, supporting configurations like 3 masters plus numerous replicas requiring only 2 master confirmations for writes.[27]
Data Models and Querying
Supported Data Models
OrientDB is a multi-model database management system that natively supports several data models within a unified storage engine, allowing seamless integration and traversal across them without data duplication.[29][1] The graph model in OrientDB follows a property graph structure, where data is organized into vertices representing entities and edges defining directed or undirected relationships between them, each capable of holding properties as key-value pairs. Vertices and edges are stored as records with physical links, enabling efficient traversals that follow these connections in constant time, independent of database size.[30][31][32] The document model treats data as JSON-like documents, which are schema-optional and can include embedded or nested sub-documents for hierarchical structures, supporting both schema-less flexibility and optional schema constraints for validation. This model integrates with the graph model by allowing documents to serve as vertices or contain links to other records.[33][32] OrientDB's key-value model provides simple, high-speed storage and retrieval using record IDs (RIDs) as unique identifiers, augmented by indexes such as hash or SB-tree for fast lookups without traversing relationships. Keys map directly to values stored as records, making it suitable for caching or basic associative access within the multi-model framework.[34][10][35] The object-oriented model enables direct persistence of Java objects (POJOs) through an Object API, supporting inheritance hierarchies, polymorphism, and encapsulation by mapping classes to database entities and handling relationships via links or embeds. This abstraction layer binds database records to object instances, facilitating object-oriented programming paradigms atop the underlying storage.[36][4][10] Additionally, OrientDB incorporates reactive, full-text, and geospatial models via built-in extensions that leverage the core engine. The reactive model supports event-driven architectures by allowing automatic propagation of changes across related records. Full-text capabilities enable indexing and searching of textual content within documents or properties. Geospatial support handles location-based data using spatial indexes for queries on points, lines, and polygons. These models extend the primary structures while maintaining compatibility for mixed-model operations.[1][29]Query Language and APIs
OrientDB employs a SQL dialect that extends ANSI SQL to accommodate its multi-model architecture, supporting operations on documents, key-value stores, and graphs within a unified syntax. This query language enables declarative querying of heterogeneous data structures, including embedded documents and graph traversals, without requiring separate query engines for each model.[37] The core of OrientDB's query language is its SQL implementation, which adheres to ANSI SQL standards for basic operations like SELECT, INSERT, UPDATE, and DELETE, while introducing extensions for NoSQL paradigms. For document-oriented queries, it supports field traversal using dot notation (e.g.,SELECT name FROM [Person](/page/Person) WHERE [address](/page/Address).city = 'New York') and collection handling with functions like EXPAND for flattening lists or sets. Graph-specific commands enhance traversal and pattern matching: the TRAVERSE statement navigates edges recursively (e.g., TRAVERSE out() FROM #12:15 to follow outgoing links from a record), while MATCH enables declarative pattern queries akin to SPARQL (e.g., MATCH {class: [Person](/page/Person), as: p} -Has-> {class: [Address](/page/Address)} RETURN p.name to find persons linked to addresses). These extensions allow seamless querying across models, such as combining document fields with graph relationships in a single statement.[37]
OrientDB provides multiple APIs and drivers for programmatic interaction, prioritizing native performance and broad language support. The native Java API, integral to the database's Java-based core, offers three primary interfaces: the Multi-Model API for document and graph operations, the TinkerPop 3.x Graph API for standard graph processing, and the deprecated TinkerPop 2.6 API for legacy compatibility. As of 2025, legacy APIs such as ODatabaseDocumentTx have been removed in favor of the modern ODatabaseSession interface, with query engine optimizations reducing memory usage for complex queries. Queries are executed via the Query API within the Multi-Model interface, such as using ODatabaseSession to run SQL statements like db.query("SELECT FROM Person"), returning OResultSet for processing. For example:
ODatabaseSession db = ...;
OResultSet rs = db.query("SELECT FROM Person");
while (rs.hasNext()) {
OResult row = rs.next();
// Process row
}
rs.close();
```[](https://orientdb.dev/docs/3.2.x/java/Java-Query-API.html)
For remote access, OrientDB exposes a RESTful HTTP [API](/page/API) over [JSON](/page/JSON), enabling queries through standard HTTP methods without language-specific bindings. Read-only SELECT queries use GET requests to `/query/<database>/sql/<query-text>`, such as `GET http://localhost:2480/query/demo/sql/select from Profile`, returning paginated [JSON](/page/JSON) results with optional limits and fetch plans for linked records. Non-idempotent commands like UPDATE employ [POST](/page/Post-) to `/command/<database>/sql/<command-text>`, supporting parameterized payloads for [security](/page/Security) and efficiency. This [API](/page/API) facilitates integration with web applications and [microservices](/page/Microservices).[](https://orientdb.dev/docs/2.2.x/OrientDB-REST.html)
Language-specific drivers extend accessibility beyond Java. The official binary drivers include OrientJS for [Node.js](/page/Node.js), supporting asynchronous query execution; PhpOrient for [PHP](/page/PHP), providing object-oriented wrappers for SQL commands; and the .NET driver for C#, enabling binary protocol communication for high-throughput scenarios. Python users rely on the community-maintained PyOrient driver, which handles binary connections for queries and transactions. These drivers abstract the binary protocol for direct socket interaction, outperforming HTTP in latency-sensitive use cases, while all support SQL execution akin to the native [API](/page/API).[](https://orientdb.dev/docs/3.2.x/apis-and-drivers/index.html)
Live queries introduce real-time capabilities, allowing applications to subscribe to database changes matching a predefined SQL filter. Introduced in version 2.1, this feature uses LIVE SELECT statements (e.g., `LIVE SELECT FROM Game WHERE game_id = "201606-001"`) registered via APIs like `db.liveQuery()` in OrientJS, triggering event handlers for inserts, updates, or deletes. For instance, in Node.js:
```javascript
db.liveQuery('LIVE SELECT FROM Game WHERE game_id = "201606-001"')
.on('live-update', function(data) {
console.log('Score updated:', data.content.score);
});
This push-based mechanism eliminates polling, enabling reactive applications such as live dashboards or collaborative tools, with token-based authentication required since version 2.2.[38]
Security for queries and APIs is enforced through a role-based access control (RBAC) model, where users are assigned roles defining permissions on resources like database.query and database.function. Roles use bitmask values (e.g., 15 for full CRUD access) to granularly control operations: the default admin role grants unrestricted querying, reader permits only SELECT on database.query, and writer allows reads and writes. Since version 3.1, security policies extend this with conditional rules (e.g., READ = TRUE WHERE owner = currentUser), applied at query execution to prevent unauthorized data exposure. Functions, executable via SQL or APIs, inherit these controls, ensuring role-specific invocation. Administrators manage roles via SQL on the OUser and ORole classes, such as CREATE ROLE queryOnly ALLOW database.query:1.[39]