Hubbry Logo
RiakRiakMain
Open search
Riak
Community hub
Riak
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Riak
Riak
from Wikipedia
Riak
DeveloperCommunity Basho Technologies
Initial releaseAugust 17, 2009; 16 years ago (2009-08-17)
Stable release
3.2.0 / January 1, 2023; 2 years ago (2023-01-01)[1]
Repository
Written inErlang
Operating systemLinux, BSD, macOS, Solaris, Raspian
PlatformIA-32, x86-64, AArch32
TypeNoSQL Database, data store, Cloud storage
LicenseApache License 2.0
Websiteriak.com

Riak (pronounced "ree-ack" [2]) is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability.[3] Riak moved to an entirely open-source project in August 2017, with many of the licensed Enterprise Edition features being incorporated.[4] Riak implements the principles from Amazon's Dynamo paper[5] with heavy influence from the CAP theorem. Written in Erlang, Riak has fault-tolerant data replication and automatic data distribution across the cluster for performance and resilience.[6]

Riak has a pluggable backend for its core storage, with the default storage backend being Bitcask.[7] LevelDB is also supported, with other options (such as the pure-Erlang Leveled) available depending on the version.

Riak was originally developed by engineers employed by Basho Technologies[8] and maintained by them until 2017 when the rights were sold to bet365[9][10] after Basho went into receivership.[11]

Main features

[edit]
Fault-tolerant availability
Riak replicates key/value stores across a cluster of nodes with a default n_val of three. In the case of node outages due to network partition or hardware failures, data can still be written to a neighboring node beyond the initial three, and read-back due to its "masterless" peer-to-peer architecture.
Queries
Riak provides a REST-ful API through HTTP and Protocol Buffers for basic PUT, GET, POST, and DELETE functions. More complex queries are also possible, including secondary indexes, search (via Apache Solr), and MapReduce. MapReduce has native support for both JavaScript (using the SpiderMonkey runtime) and Erlang.
Predictable latency
Riak distributes data across nodes with hashing and can provide latency profile, even in the case of multiple node failures.
Storage options
Keys/values can be stored in memory, disk, or both.
Multi-datacenter replication
Multi-Datacenter replication (MDC) provides uni-directional and bi-direction replication of data between Riak clusters, whether locally for resilience or globally for faster regional access. Uni-directional replication is useful for read-only sinks such as backups and Disaster Recovery sites. Bi-directional replication allows for multiple Riak cluster to have eventually consistent data across vast distances. Complex replication scenarios such as chains, hub-and-spoke and mesh networks are possible due to the Cascades feature, which allows replication of data between clusters that are not directly connected. There are two primary modes of operation: fullsync and realtime. Fullsync mode ensures that all data on the source cluster is replicated to the sink cluster. Only the metadata and changes are transferred, making this fast and efficient. Realtime mode sends updates made to a source cluster to the sink cluster in realtime. These modes are designed to work together for best performance All multi-datacenter replication occurs over multiple concurrent TCP connections to maximize performance and network utilization.
Tunable consistency
Option to choose between eventual and strong consistency for each bucket.

Main products

[edit]

All versions of Riak are now entirely open-source and free, and include the extra features that Basho charged license fees for.

Basho operated a freemium model, wherein they provided free versions of Riak in the form of Riak Core, Riak KV, Riak CS and Riak TS but made their money from licensing more advanced features and SLA-based support. The extra features from the Enterprise Editions have since been integrated into the open source version of Riak KV, as of Riak KV release 2.2.6.[12] and Riak CS 2.1.2 [13]

Riak Core and Riak Core Lite

[edit]

Riak Core

[edit]

riak_core[14] is the distributed systems framework that underpins Riak, forming the foundation for all Riak versions. It is being maintained as part of Riak.

Riak Core Lite

[edit]

riak_core_lite[15] is intended for general use as a base for creating distributed systems.

Riak KV (Key-Value)

[edit]

Riak KV is a distributed NoSQL database designed to deliver maximum data availability by distributing data across multiple servers, meaning that if one client can reach one server, it should be able to read and write data.[16] KV went through a few names in its lifetime, starting as Riak then Riak DS (for Data Store) and finally Riak KV (for Key-Value).

When Basho Technologies went into receivership in 2017[11] KV development was picked up by the open source community and has continued into 2021, with 2.2.6 released in 2018 being the first community release of KV. This release integrated some features that were originally restricted to Basho's Enterprise versions of Riak.[12]

Version 2.9.0 was the first major community release by the open source community, releasing in November 2019,[12] with version 3.0.1 following on August 20, 2020.[12] Development has continued since then with the latest release being version 3.0.7.[17]

Removed features

[edit]

The current version of Riak no longer supports some features in the Enterprise edition of Riak, including:

  • SNMP/JMX support

Separated features in Riak KV 3.0+

[edit]

The following features of Riak KV 2.x have been removed by default from the Riak build. Specific builds including these features are available.

  • Yokozuna

Riak CS (Cloud Storage)

[edit]

Originally known as Riak Moss[18](Riak Multi-tenant Object Storage System - MOSS) but named as Riak CS (Cloud Storage) when released, Riak CS was first publicly released in January 2012.

Riak CS (Cloud Storage) is object storage software built on top of Riak KV, Riak's distributed database. Riak CS is designed to provide simple, highly-available, distributed cloud storage at any scale, and can be used to build cloud architectures or as storage infrastructure for heavy-duty applications and services.[19]

Riak CS also includes an application called Stanchion[20] which is used to manage the serialization of requests. This enables Riak CS to manage globally unique entities like users and bucket names. Serialization in this context means that the entire cluster agrees upon a single value for any globally unique entity at any given time; when that value is changed, the new value must be recognized throughout the entire cluster.

Riak CS was briefly rebranded as Riak S2 to make it more obviously compatible with Amazon S3 but the name did not catch on and it reverted to Riak CS.

In 2021 development for Riak CS was resumed with contributions from TI Tokyo.

Riak TS (Time Series)

[edit]

Riak TS is an extension to Riak KV optimized for time series data, in that:

  • it supports structured data, with table definition (with a CREATE TABLE call) required before data can be written;
  • data slices from contiguous regions in its primary index (“quanta”) are stored on the same partition;
  • CRUD operations are optimized for speed, at the expense of consistency.

A limited subset of SQL commands was implemented in Riak TS. There is no provision for consistency guarantees between tables (no foreign indexes). In SELECT statements, WHERE clause is supported but HAVING is not. ORDER BY was to appear in a version that was never released.

Riak TS existed as a collection of branches (in separate components of Riak KV such as riak_kv, riak_pb, etc.) and not as product with a repository of its own. It was developed by a dedicated team consisting of Gordon Guthrie (leader), Andy Till and Andrei Zavada, with occasional contributions from other developers.

Riak TS was conceived, along with Riak Data Platform project, as an attempt to diversify Basho's product line, an undertaking many insiders regard as misguided and eventually contributing to Basho's demise.

Licensing and support

[edit]

Riak was originally licensed using a freemium model: open source versions of Riak KV, Riak CS and Riak TS are available, but end users can pay for additional features and support. However, since Basho entered receivership[11] and bet365[10] (purchasers of all IP) made all Riak products fully open source, all the premium features are now available in the open source versions. Since Basho's demise, community[21] ad-hoc and paid support options have arisen.

Language support

[edit]

Riak has official drivers for Ruby, Java, Erlang and Python. There are also numerous community-supported drivers for other programming languages.[22]

Community development

[edit]

After bet365[10] purchased the Riak IP,[9] the Riak products were made full open source and work to integrate premium features into the open source versions was completed with the 2.2.6 release.

History

[edit]

Riak was originally written by Andy Gross and Justin Sheehy at Basho Technologies[2] to power a web Sales Force Automation application by former engineers and executives from Akamai. There was more interest in the datastore technology than the applications built on it, so the company decided to build a business around Riak itself, gaining adoption throughout the Fortune 100 and becoming a foundation to many of the world's fastest-growing Web-based, mobile and social networking applications, as well as cloud service providers. Releases after graduation include

Riak KV

[edit]

Riak 1.0 was released September 10, 2011

Riak KV Releases
Version! Date Released Changes
1.0 September 10, 2011 Inition 1.0 Release[23]
1.1 February 21, 2012 Added Riaknostic, enhanced error logging and reporting, improved resiliency for large clusters, and a new graphical operations and monitoring interface called Riak Control.
1.4 July 10, 2013 Added counters, secondary indexing improvements, reduced object storage overhead, handoff progress reporting, and enhancements to MDC replication
2.0 September 2, 2014 Added new data types including sets, maps, registers, and flags simplifying application development. Strong consistency by bucket, full-text integration with Apache Solr, Security, and reduced replicas for Secondary sites.
2.1 April 16, 2015 Added an optimization for many write-heavy workloads – “write once” buckets – buckets whose entries are intended to be written exactly once, and never updated or over-written.
2.2 November 17, 2016 added Support for Debian 8 and Ubuntu 16.04, Solr integration improvements.[24]
2.2.6 May 21, 2018 The first community release. Added support for Multi-Datacentre Replication which was not part of open-source Riak before, added a grow-only set data type, improved data distribution over nodes and cleaned up production test issues.[25]
2.9.0 November 20, 2019 Added early support for TicTac Active Anti-Entropy, support for a new Riak specific backend called Leveled.[12]
2.9.1 February 17, 2020 Implements next-gen replication,[26] various changes to tombstones and bucket listing.[12]
2.9.7 August 16, 2020 Improved Active Anti-Entropy and improved Riak's overall stability.
2.9.8 December 7, 2020 Improved leveled functions[27]
2.9.9 August 6, 2021 Leveled stability improvements [28]
3.0.1 August 20, 2020 Adds support for OTP 20, 21, 22 but is not backwards compatible with previous OTP versions.[12]
3.0.2 January 5, 2021 Implements backend changes from 2.9.8, adds a range_check in the Tictac AAE based full-sync replication[12]
3.0.6 May 8, 2021 Adds location-awareness to cluster management,[29] along with bug fixes from 3.0.3 and 3.0.4.[12]
3.0.7 July 21, 2021 Reverts Riak erlang runtime system in interactive mode, rather than the embedded mode it was changed to previously.[17]
3.0.8 October 12, 2021 Support flushing of disk writes, implement read-repair for key ranges to accelerate recovery after known node outage[30]
3.0.9 November 12, 2021 Improve latency and expand statistics of secondary-index queries, including information about result counts and overall query time[31]
3.0.10 March 30, 2022 Improve memory management in leveled backend, add peer discovery without configuration changes, allow configuration of Erlang VM memory and scheduler settings via riak.conf[32]
3.0.11 October 11, 2022 Fix a bottleneck in secondary index queries with the leveled backend and > 1000 queries per second[33]
3.0.12 December 20, 2022 Improve memory management in the leveled backend, update leveldb snappy compression for wider platform support, introduce the reip_manual console command[34]
3.2.0 January 1, 2023 Support Erlang/OTP 22, 24, and 25, update to Erlang/OTP's new logging API, update packaging to include support for Alpine Linux and FreeBSD[35]
3.0.13 February 4, 2023 Improve reliability of handoffs, add administrative helper functions to riak_client[36]
3.0.14 February 13, 2023 Fix an issue related to handling back-pressure correctly in the leveled backend, add support for handing off reap requests via the handoff_deletes option[37]
3.0.15 February 15, 2023 Correct an issue introduced with the auto_check feature for TictacAAE full-sync introduced in 3.0.10[38]

Riak CS

[edit]

Riak CS was made open source on March 20, 2013[39]

Riak CS Releases
Version! Date Released Changes
0.0.3 January 26, 2012 The first public release of Riak CS. Known as Riak MOSS at the time.[40]
0.1.0 February 25, 2012 Bucket-level Access control, user record changes, Stanchion is now required.[41]
1.0.0 April 2, 2012 Fixes some process/socket leaks, adds a fix to prevent deadlock conditions, new subsystem for user access & storage usage calculations.[42]
1.0.1 April 18, 2012 Fixes a bug that caused requests to hang if a node in the cluster was unavailable[43]
1.1.0 August 20, 2012 Updates user creation, configuration options for anonymous users, more user account controls for admins, Garbage collection for deleted objects, improved performance.[44]
1.2.0 October 23, 2012 Early support for Multi-datacenter replication, support for riak_test integration, bug fixes.[45]
1.2.1 January 23, 2016 Add reduce phase for listing bucket contents to provide backpressure when executing the MapReduce job, Use prereduce during storage calculations, fixed incorrect 404 error when attempting to list contents of nonexistent bucket.[46]
1.2.2 November 8, 2012 Full support for MDC replication, fixed process leaks.[47]
1.3.0 March 20, 2013 Support for multi-part file uploads, bucket polices for restricted principles/conditions, range header. More administrative command controls, support for FreeBSD, SmartOS and Solaris Packaging.[48]
1.3.1 April 4, 2013 Bug fixes[49]
1.4.0 August 12, 2013 Early support for Swift API/Keystone Authentication, improved performance, bug fixes.[50]
1.5.0 July 31, 2014 Adds a multibag technical preview, new debug command, streamlines commands to new `riak-cs admin` command, improved garbage collection, updated lager, new API - Multiple objects, warning logs for manifests, siblings etc.[51]
1.5.1 September 10, 2014 Adds Bucket restrictions, adds sleep interval for manifest updates, updates riak-cs-debug, changes to bucket resolution.[52]
1.5.2 October 9, 2014 Improved logging for failures with Riak, Changes to log output for access stats, adds a script for invalid garbage collection manifest repairs.[53]
1.5.3 December 12, 2014 Add read_before_last_manifest_write option, Adds configurable timeouts for CS interactions with Riak[54]
1.5.4 March 13, 2015 Disable backpressure sleep, Fixes an incorrect path rewrite in S3 API[55]
2.0.0 March 27, 2015 Updates Riak CS to work with Riak 2.0.5, Changes gc_max_workers to gc.max_workers and changed default setting, early support for AWS v4 authentication, adds cuttlefish, storage optimisations.[56]
2.1.0 October 14, 2015 Final Basho release - Backwards compatible with KV 2.0.5, 21.1, Adds a large number of new metrics for health monitoring purposes along with storage usage metrics. Replaced commands with riak-cs-admin equivalents. Garbage collection improvements.[57]
2.1.1 October 14, 2015 Compatible with KV 2.1.3, 2.1.4, 2.2.x and 2.9.x
2.1.2 April 9, 2019 First post-basho release.[58]

Riak TS

[edit]

Riak TS was originally released in October 2015 [59]

Riak TS Releases
Version! Date Released Changes
1.2.0 February 23, 2016 Implements Riak_shell to allow SWQL commands & logging in a single shell in Riak TS. Bug fixes, Multi-Datacenter replication and riak search not supported.[60]
1.3.0 May 4, 2016 Open sources Riak TS, adds a HTTP API, additional SQL commands and support for MDC replication for enterprise users[61]
1.3.1 July 5, 2016 Addresses Data loss bug in 1.3.0.[62]
1.4.0 August 24, 2016 Adds new SQL features, Rolling upgrade/downgrade support, Global data expiry (per cluster).[63]
1.5.0 December 20, 2016 Expands SQL implementation, Improves data storage and improved overall performance.[64]
1.5.1 January 24, 2017 Bug fixes from 1.5.0[65]
1.5.2 February 21, 2017 Bug Fixes from 1.5.1[66]

Users

[edit]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Riak is an open-source, distributed key-value database that emphasizes , , operational simplicity, and horizontal scalability on commodity hardware. It employs a masterless architecture inspired by Amazon's system, automatically distributing data across clusters to ensure resilience against node failures and network partitions without a . Originally developed in Erlang, Riak supports flexible data models including key-value pairs for and for IoT applications, making it suitable for use cases like session management, real-time analytics, and large-scale web applications. Developed by Basho Technologies starting in 2008, Riak emerged as one of the early distributed solutions aimed at simplifying enterprise for and high-velocity environments. Basho, founded to commercialize the technology, released Riak KV as under the 2.0 license while offering enterprise editions with advanced features like multi-datacenter replication and security integrations. In , Basho introduced Riak TS, a specialized variant optimized for fast ingestion and querying of timestamped data, which was later open-sourced. Following Basho's financial difficulties and bankruptcy in 2017, the and development rights were acquired by , a major user of the database, ensuring continued maintenance and support. Key strengths of Riak include its eventual consistency model, which prioritizes availability over strict consistency using techniques like vector clocks and conflict resolution via CRDTs (Conflict-Free Replicated Data Types), allowing seamless operation in distributed setups. It integrates with tools like Apache Solr for search, Redis for caching, and Apache Spark for analytics, supporting hybrid cloud deployments and massive scale for applications handling petabytes of data. Despite its robust design, Riak's adoption has been notable in industries requiring extreme uptime, such as gaming, finance, and telecommunications, though it competes with alternatives like Cassandra and Redis in the evolving NoSQL landscape.

Overview

Definition and Purpose

Riak is an open-source, distributed key-value database designed for , , and operational simplicity in handling large-scale storage and retrieval. It emphasizes resiliency in distributed environments, allowing applications to continue functioning during hardware failures or network partitions by automatically distributing across a cluster of servers. This makes it particularly suited for use cases, such as tracking user sessions, storing sensor from devices, and enabling global data replication without downtime. Implemented primarily in Erlang, Riak leverages the language's strengths in concurrency, distribution, and fault tolerance to manage complex interactions across nodes efficiently. Unlike traditional relational databases, which rely on rigid schemas and ACID compliance for strong consistency, Riak is schema-free and prioritizes eventual consistency, enabling horizontal scaling and flexibility for unstructured data. This design choice, inspired by principles in Amazon's Dynamo paper, supports tunable consistency levels to balance availability and accuracy based on application needs. As of 2025, Riak remains an actively maintained open-source project, with the latest release of Riak KV 3.2.5 in March 2025, alongside commercial editions offering enhanced support and features.

Design Inspirations

Riak's architecture draws significant inspiration from Amazon's system, outlined in the 2007 paper by DeCandia et al., which proposed a decentralized key-value store designed for in large-scale distributed environments. This influence is evident in Riak's adoption of a topology, where no single node acts as a master, eliminating central coordination points and enabling symmetric node responsibilities for data partitioning via and gossip-based membership protocols. Dynamo's emphasis on "always writeable" stores during failures directly shaped Riak's fault-tolerant mechanisms, such as replication across multiple nodes and hinted handoffs, ensuring operations continue even under network partitions or node outages. Central to Riak's design is the application of the , first conjectured by Eric Brewer in 2000 and later formalized, which posits that distributed systems can guarantee at most two of three properties: consistency, , and partition tolerance. Riak prioritizes and partition tolerance (AP), forgoing strict consistency to maintain system responsiveness in the face of network failures, a choice aligned with web-scale demands where downtime is unacceptable. This AP orientation allows Riak to operate as an eventually consistent system by default, with optional strong consistency modes for specific use cases that may temporarily sacrifice . The model in Riak extends Dynamo's versioning approach, using vector clocks to detect and resolve conflicts while enabling tunable parameters like read quorum (R), write quorum (W), and replication factor (N) to balance consistency, , and . For instance, setting W greater than R ensures higher consistency at the cost of reduced , allowing developers to adjust trade-offs per operation or without altering the overall . This flexibility addresses the limitations of rigid consistency models in distributed settings, where immediate global agreement is impractical. Riak's implementation also leverages Erlang/OTP for its concurrency primitives, inspired by the that treats processes as lightweight, isolated units communicating via asynchronous messages, facilitating scalable request handling across nodes. Features like hot code swapping, enabled by OTP's supervision trees and behaviors such as gen_server, allow runtime upgrades without service interruption, enhancing operational resilience in production environments. These design choices were motivated by the shortcomings of traditional management systems (RDBMS) in web-scale applications, particularly their reliance on master-slave architectures that introduce single points of failure and hinder horizontal scaling. By contrast, Riak's decentralized avoids such bottlenecks, supporting linear and continuous for growing data volumes and high-traffic scenarios.

Architecture

Core Framework

Riak Core is an open-source Erlang library designed as a foundational framework for building distributed, fault-tolerant applications inspired by the architecture. It provides reusable components for managing cluster coordination, data partitioning, and failure handling without relying on centralized master nodes, enabling developers to create scalable systems in Erlang/OTP environments. Written as a single OTP application, Riak Core abstracts the complexities of distributed systems, allowing applications to focus on domain-specific logic while inheriting robust clustering capabilities. A core component of Riak Core is its ring, which partitions across nodes using a fixed number of virtual partitions, typically 64 or 128, to ensure even distribution and efficient rebalancing. Ownership of these partitions is determined by hashing keys with to points on the ring, where the partition size and replication factor (N-value) dictate how is replicated across N nodes for ; for instance, with a of 64 and N=3, each key is stored on three partitions owned by different nodes. Virtual nodes (vnodes) further enhance load balancing by representing these partitions as lightweight Erlang processes, with each physical node hosting multiple vnodes (e.g., 32 vnodes per node in a two-node cluster with a 64-partition ring), allowing fine-grained distribution and automatic handoff during node failures or additions. Cluster membership and state propagation in Riak Core rely on a , where nodes periodically exchange ring state information to achieve in cluster topology without a central coordinator. This decentralized approach ensures self-healing and resilience, as changes like node joins or departures propagate organically across the cluster. For embedding in non-Riak applications, Riak Core Lite offers a lightweight variant of the framework, stripping away key-value store specifics to support custom data models with minimal overhead. It retains essential features like the hashing ring, vnodes, and but requires fewer implementation callbacks, making it suitable for building specialized distributed services such as messaging systems. The benefits of Riak Core include linear scalability through horizontal node addition and decentralized coordination that avoids single points of failure, facilitating in production environments. Since 2017, following the acquisition of Basho's assets by and subsequent community efforts, Riak has been maintained as OpenRiak by the Erlang Foundation, with the latest release (3.2.4) as of February 2025. The core architecture remains consistent, though some features have been deprecated.

Data Model and Operations

Riak employs a key-value model where objects are stored as binary blobs or structured formats such as , organized within that serve as namespaces for keys. Each object is identified by a within its bucket, and buckets can be further categorized under bucket types to apply specific properties like replication factors or consistency settings. This structure supports flexible storage of arbitrary without a rigid , allowing applications to denormalize for efficient retrieval in distributed environments. Basic operations in Riak follow a simple CRUD pattern via HTTP or APIs. The PUT operation stores or updates an object, accepting optional metadata such as content types or custom headers (e.g., X-Riak-Meta- tags for application-specific attributes), and can include vector clocks for causal context to prevent overwrites. The GET operation retrieves objects, performing reads to ensure responses from a configurable number of replicas, returning the most recent value or siblings if conflicts exist. The DELETE operation marks objects for removal by creating tombstones—special empty objects with an X-Riak-Deleted header—enabling eventual deletion across replicas while distinguishing deleted items from non-existent ones; tombstones are reaped after a configurable interval to reclaim space. Secondary indexes, known as 2i, extend the key-value model by allowing objects to be tagged with secondary key-value pairs at write time, facilitating queries on non-primary attributes like user IDs or timestamps. These indexes are stored alongside the object data on virtual nodes and support exact-match or range queries, returning lists of matching keys for further retrieval; however, they are best suited for low-cardinality fields to avoid performance degradation from large result sets. Queries span a covering set of partitions based on the object's n_val (replication factor, default 3), merging results client-side. Riak previously integrated full-text search capabilities through Riak Search, a Solr-based module that indexed object values using for distributed querying and scoring. However, Riak Search was deprecated and removed in OpenRiak releases around 2025 due to scaling and maintenance challenges. In Riak's eventually consistent model, concurrent writes to the same key can produce sibling values—multiple conflicting versions tracked via vector clocks or dotted version vectors. These siblings are returned during reads when allow_mult is enabled (default true for new bucket types in Riak 2.0+), requiring client-side resolution using application-specific logic, such as timestamp-based selection or merging; alternatively, custom resolvers or (e.g., counters, sets) can automate conflict handling for common scenarios. Performance and consistency are tuned via read () and write () quorums, alongside the replication factor (n_val, default 3), where operations succeed only after acknowledgments from the specified number of replicas. Defaults use "" (floor(N/2) + 1), but values can be set numerically or symbolically (e.g., "one", "all"); to ensure read-your-writes consistency, quorums satisfy R + W > N, guaranteeing overlap between read and write sets for recent updates.

Replication and Consistency

Riak employs a replication factor denoted as NN, which specifies the number of nodes across which each object is replicated to ensure and ; the default value is 3, meaning three copies of each object are stored on different nodes in the cluster. This replication is complemented by hinted handoff, a mechanism that allows a neighboring node to temporarily store writes intended for a failed node, maintaining during short-term outages until the primary node recovers and the data is transferred back. To balance consistency and performance, Riak offers tunable quorum parameters: RR for the minimum number of replicas that must respond to a read request, WW for the minimum number that must acknowledge a write, and DWDW for the minimum that must persist the write to durable storage, with defaults set to "quorum" (defined as N/2+1\lfloor N/2 \rfloor + 1). In the eventual consistency model, read-your-writes consistency can be achieved when these parameters satisfy R + W > N (or equivalently, both R and W greater than N/2), ensuring that read and write quorums overlap sufficiently to prevent reading immediately stale data, as derived from the underlying Dynamo model's quorum intersection principles. For example, with the default N=3N=3, setting R=2R=2 and W=2W=2 meets this threshold since 2 + 2 > 3. Riak also supports a separate experimental strong consistency mode for specific buckets since version 2.0, using coordinated replication to provide linearizable guarantees, though it is not the default and requires cluster configuration. Riak maintains data consistency across replicas through active anti-entropy processes, utilizing Merkle trees to efficiently detect differences between nodes by comparing hierarchical hash structures, enabling targeted repairs without full data transfers. In multi-datacenter setups, replication supports both fullsync mode, which performs complete between clusters, and real-time mode, which propagates ongoing changes via queues to minimize latency across sites. Later versions of Riak introduced next-generation replication enhancements, leveraging the Leveled storage backend to improve efficiency in handling version metadata over traditional vector-clock-based methods, reducing overhead in and synchronization. Regarding network partitions, Riak adheres to the by prioritizing availability (AP model), allowing operations to proceed on reachable nodes during partitions via mechanisms like sloppy quorums and hinted handoffs, followed by automatic recovery through anti-entropy and handoff once connectivity is restored.

Products

Riak KV

Riak KV serves as the flagship product of the Riak ecosystem, constructed atop the Riak Core framework to provide a distributed key-value store optimized for high-availability general-purpose storage of . It distributes objects across a cluster of nodes using , ensuring that data remains accessible even if multiple nodes fail, thereby prioritizing over strict consistency in line with models. This design makes Riak KV suitable for applications requiring scalable, fault-tolerant storage, such as session management, real-time analytics, and content caching, where read and write operations can proceed as long as at least one replica is reachable. The evolution of Riak KV has seen significant updates starting with version 3.0 in , which introduced compatibility with Erlang/OTP 20 and later versions, enabling better performance on modern systems but requiring careful upgrades due to non-backward compatibility in some dependencies. In versions 3.0 and beyond, features like Active Anti-Entropy (AAE) for background , which are configurable and can be disabled for resource optimization while relying on read repair for consistency. Similarly, Riak Search provides integrated capabilities using , with indexing and querying handled through embedded Solr instances per node. Post-Basho releases, such as 2.2.6, incorporated previously enterprise-only features like multi-datacenter replication into the open-source core, streamlining the open-source distribution for single-cluster deployments. As of , the project is maintained via the OpenRiak community fork, focusing on stability and compatibility with modern Erlang/OTP versions. Riak KV supports multiple pluggable storage backends to accommodate diverse workloads, with Bitcask serving as the default for its design that excels in high-write-throughput scenarios by appending data sequentially and managing compaction in the background. provides an alternative for datasets with larger keys or range-scan needs, leveraging a more traditional LSM-tree architecture for better compression and query efficiency at the cost of slightly higher write latency. Additionally, a backend is available for low-latency caching of hot data, and the multi-backend option allows per-bucket-type assignment of different engines within the same cluster to optimize for mixed access patterns. Security in Riak KV is enabled through a modular system requiring SSL/TLS activation for encrypted inter-node and client communication, with supporting HTTP basic auth, certificate-based methods, or pluggable external providers like PAM. operates via a role-based module that defines users, groups, and granular permissions for operations like read, write, and admin access, configurable in the riak.conf file to enforce least-privilege access across the cluster. Performance optimizations in Riak KV include support for atomic counters via its Data Types feature, introduced in , which allows increment and decrement operations on counter values associated with keys without requiring client-side , ensuring thread-safe updates in distributed environments. For batch operations, Riak KV leverages pipelines to distribute processing across nodes, enabling efficient aggregation and transformation of large datasets by chaining , reduce, and post-commit phases in a fault-tolerant manner. These capabilities, combined with configurable async threading for I/O-bound tasks, help achieve high throughput, with benchmarks showing sustained writes exceeding 100,000 operations per second on commodity hardware clusters. The latest release, Riak KV 3.2.5 on March 25, 2025, emphasizes stability enhancements, including fixes for replication handoff issues during node additions or failures to prevent data divergence, alongside general improvements to cluster reliability under high load. Source code and packages are available via the OpenRiak GitHub repository, continuing the project's open-source maintenance under community stewardship.

Riak TS

Riak TS is a distributed database specifically designed for managing data, enabling efficient storage and retrieval of timestamped records at high velocity. It supports fast ingestion of large volumes of temporal data, such as sensor readings from IoT devices, by organizing information into structured tables that prioritize time-based access patterns. Unlike general-purpose key-value stores, Riak TS co-locates related data points within defined time intervals to optimize query performance and reduce latency for analytical workloads. As of 2025, the project is maintained via the OpenRiak community fork, focusing on stability and compatibility with modern Erlang/OTP versions. The in Riak TS revolves around predefined tables with a that includes exactly one column of type TIMESTAMP, representing Unix time in UTC milliseconds since January 1, 1970. Additional columns consist of series keys—typically categorical fields like device ID or for grouping —and value columns, which are numeric for metrics like or speed. Keys are composite: partition keys combine series values with a time quantum to distribute across the cluster, while range keys specify the exact within that quantum, ensuring ordered storage without relying on external indexing for time-based operations. This structure facilitates horizontal scaling and while maintaining locality for common time-range queries. Queries in Riak TS utilize an SQL-like interface that supports SELECT statements with mandatory WHERE clauses filtering on series keys and time ranges, enabling range scans over specific intervals. Aggregations such as , SUM, AVG, MIN, MAX, and STDDEV can be applied to summarize data efficiently, while limited joins are possible across time-partitioned datasets for correlating series. For example, a query might aggregate values per device over a one-hour , leveraging the table's to prune irrelevant partitions and return results sub-second even on petabyte-scale datasets. Key optimizations in Riak TS include automated partitioning by configurable time quanta—such as 15 minutes, hours, or days—allowing data to be segmented and co-located on the same physical nodes for rapid ingestion and retrieval in IoT scenarios. This time-based sharding minimizes cross-node traffic during queries, and built-in compression reduces storage overhead for repetitive high-velocity streams like telemetry data. The system is built atop the Riak Core framework, sharing the operational simplicity and resilience of Riak KV, which enables hybrid deployments where tables coexist with key-value buckets in the same cluster for unified . Distinct features of Riak TS include configurable expiry policies that automatically purge old data beyond a specified , helping manage storage costs for transient without manual intervention. Secondary indexes on non-time fields, inherited from Riak KV, allow querying by attributes like geographic region alongside time filters, though the native partitioning often obviates the need for them in pure temporal workloads. Development of Riak TS culminated in major releases around 2016, with the 1.3 version open-sourced that year; since Basho's closure in 2017, it has been maintained through community efforts and forks, including stability enhancements by enterprise users as of 2025.

Riak CS

Riak CS is a distributed object storage system designed for storing , including files, backups, videos, images, and other large blobs. Built on top of Riak KV, it breaks objects into smaller blocks that are distributed, replicated, and made highly available across clusters, facilitating use in public, private, or hybrid cloud environments. This S3-compatible interface allows developers to treat Riak CS as a for in many applications, supporting operations on buckets and objects without requiring custom client code. As of 2025, the project is maintained via the OpenRiak community fork, focusing on stability and compatibility with modern Erlang/OTP versions. The architecture of Riak CS integrates Riak KV as the underlying storage engine, where object data is stored as key-value pairs, with blocks streamed and replicated for . Metadata operations, such as creating users, buckets, and access controls, are managed by , a separate Erlang-based service that serializes requests to ensure global uniqueness and consistency for these entities across the cluster. Riak CS supports objects up to multi-gigabyte sizes through multipart mechanisms, enabling efficient handling of large files by dividing them into parallel parts. Core features emphasize multi-tenancy through admin-created user accounts with access keys for and , allowing isolated namespaces per tenant. Object versioning preserves historical versions of objects, while lifecycle policies automate transitions, such as expiring old objects or archiving them to lower-cost storage. Access control lists (ACLs) provide fine-grained permissions at the bucket and object levels, mirroring S3 semantics. The system fully emulates the Amazon S3 REST API, including GET, PUT, DELETE, and listing operations, ensuring broad compatibility with S3 tools and libraries. Riak CS achieves scalability by distributing data and requests across masterless clusters, with no , and supports multi-datacenter replication for geographic distribution. An erasure coding option, configurable via the Riak KV backend, enhances storage efficiency by reducing redundancy compared to traditional replication, particularly for cost-sensitive deployments. Following Basho Technologies' bankruptcy in 2017, Riak CS transitioned to community maintenance under an Apache 2.0 license, with integrations for ecosystems like Hadoop (via S3 connectors) and limited support for Swift APIs. While optimized for large objects, Riak CS incurs higher latency for small object accesses due to the overhead of S3 API emulation and block management, making direct Riak KV preferable for such workloads.

Development

History

Riak was developed by Basho Technologies, a company founded in January 2008 by Earl Galleher and Antony Falco to create distributed solutions inspired by emerging technologies like Amazon's . The project marked one of the early open-source distributed key-value stores, with its initial public release occurring in 2009 under the Apache 2.0 license, emphasizing and . Key milestones during Basho's stewardship included the launch of Riak 1.0 in September 2011, which introduced multi-datacenter replication as an enterprise feature to enable seamless data syncing across global clusters. In 2015, Basho expanded the product line with Riak TS, a specialized optimized for IoT and sensor data, unveiled in October and generally available by December. The company experienced significant growth, securing multiple funding rounds totaling over $60 million, including a $25 million Series G round in January 2015 led by Georgetown Partners, which supported enterprise adoption by major brands and one-third of the Fortune 50. Basho entered in April 2017 amid financial challenges, leading to the sale of its assets, including Riak's , to in August 2017; , a long-time user, aimed to ensure continuity for the technology. Post-acquisition, Riak remained under Apache 2.0, with -driven maintenance sustaining development despite reduced commercial backing from Basho. efforts included forks and contributions starting around 2018 to address ongoing needs in stability and compatibility. Recent advancements reflect sustained community involvement, with Riak KV 3.0 released in August 2020 to restructure features for better support of Erlang/OTP versions 20 through 22, though not fully backward-compatible with prior releases. This was followed by Riak KV 3.2.0 in January 2023, focusing on stability through OTP uplifts to versions 24 and 25, along with logging API updates and Alpine Linux packaging support. The latest update, Riak KV 3.2.5 in March 2025, addressed replication fixes in the next-generation full-sync mechanism to enhance inter-cluster reliability. Commercial support persists through entities like TI Tokyo, which provides enterprise-grade maintenance for Riak KV, TS, and CS as of 2025, alongside the active riak.com site for downloads and documentation.

Licensing and Support

Riak was originally developed under a dual licensing model by Basho Technologies, featuring an open-source core under the Apache 2.0 license alongside proprietary enterprise features such as multi-cluster replication, advanced security including and , and SNMP monitoring. In August 2017, following Basho's financial challenges, acquired the Riak and transitioned the project to a fully open-source model, incorporating many enterprise features into the open-source codebase while retaining the Apache 2.0 license, which permits free use, modification, and distribution with minimal restrictions. Commercial support for Riak remains available through multiple channels to accommodate production deployments. Enterprise subscriptions via riak.com provide access to enhanced features in Riak KV, Riak TS, and Riak S2, along with dedicated engineering assistance and service-level agreements (SLAs) for operational reliability. Additional full-product support, including for Riak CS, is offered by TI Tokyo in partnership with Erlang Solutions, featuring 24/7 options tailored for enterprise needs. bet365 maintains internal support for its own Riak infrastructure while contributing to the open-source project's ongoing development and stability. Support options are tiered to suit different user requirements. For the open-source edition, community-driven resources include repositories for issue tracking and code contributions, IRC channels on , Slack workspaces, and discussion forums where users can seek real-time assistance and share experiences. Paid support tiers, such as those from riak.com and TI , offer structured SLAs with guaranteed response times, proactive monitoring, and expert consulting for mission-critical environments. Riak's end-of-life policies ensure long-term maintainability for stable releases. According to the OpenRiak community roadmap, Riak KV 3.2 is designated as the recommended production version and will receive maintenance support until the end of 2025, after which users are encouraged to migrate to subsequent releases like 3.4 or 3.6. Riak's open-source nature and architecture promote compliance with vendor-agnostic principles, avoiding lock-in by running on commodity hardware and integrating seamlessly with major cloud providers through standard protocols and connectors for tools like .

Community Involvement

The Riak open-source community is primarily coordinated through the OpenRiak GitHub organization, which maintains a fork of the original Basho Riak repository and hosts related projects such as documentation and client libraries. This organization collaborates with the Erlang Ecosystem Foundation to ensure ongoing development and stability. Additionally, riak.info serves as a central hub for community news, release announcements, and event updates, keeping participants informed about the latest advancements. Community contributions focus on essential maintenance tasks, including bug fixes and security patches, which have supported multiple releases such as Riak KV 3.2.5 in March 2025. Discussions around the project roadmap, particularly for versions 3.4 and 3.6 planned after 2025, occur in dedicated forums, emphasizing improvements in stability under complex failure scenarios. Resources for engagement include the Riak Users for technical discussions and , as well as the #riak IRC channel on for real-time support. Annual community roadmaps guide priorities, with Riak 3.2 designated for as the recommended production release until the end of 2025. Key contributors consist of post-Basho volunteers who have sustained the project through volunteer efforts, including integrations with modern infrastructure like via community projects such as kubriak-kv. These efforts highlight the 's role in adapting Riak to containerized environments. Following Basho's closure in 2017, community activity experienced a reduction, with fewer commercial resources available, but recent releases and ongoing discussions indicate a resurgence driven by interest in IoT and applications. Official documentation is hosted at docs.riak.com, while wikis and repositories provide guidance for custom builds and extensions.

Integration and Adoption

Language Support

Riak provides official open-source client libraries for several programming languages, enabling developers to interact with the database through its core interfaces. These include native support in Erlang, as Riak itself is implemented in Erlang/OTP, along with dedicated libraries for , Python, , Go, , C# (.NET), and . These clients facilitate operations via the efficient riak_pb interface for binary communication or the HTTP/ API, which follows RESTful conventions for straightforward web access. Community-driven libraries extend Riak's accessibility to additional languages, including Scala, by implementing compatibility with the standard protocols. This protocol-level support allows for the creation of custom clients in other languages without requiring official maintenance. For instance, the Scala library builds on the client to provide abstractions. Riak's design supports seamless integration with broader data ecosystems, such as for distributed analytics and Kafka for real-time streaming ingestion. The official Spark-Riak Connector enables reading and writing data using Spark's RDD and DataFrame APIs in , Scala, or Python, facilitating large-scale queries on Riak-stored data. Similarly, Kafka integration often occurs through Spark Streaming, where data from Kafka topics is ingested into Riak tables for processing. To ensure reliable performance in distributed environments, best practices for Riak clients emphasize connection pooling to reuse connections across operations and reduce overhead, as well as built-in retry logic for handling transient network or node failures. Official clients like the Python library automatically manage pools and retries for operations that can safely be reattempted on alternate nodes. Community efforts under the OpenRiak project, a maintained by the Erlang Ecosystem Foundation since 2024, continue to enhance Riak's compatibility with modern environments, including updates for OTP 26 and 28.

Notable Users and Use Cases

Riak has been adopted by several prominent organizations across various industries, leveraging its distributed architecture for high-availability data management. In the healthcare sector, the UK's (NHS) deployed Riak in 2014 as the backbone for its Spine2 system, a handling records for approximately 90 million individuals and supporting up to 200,000 users, replacing a previous Oracle-based setup to improve performance and agility. However, as of 2025, the NHS is developing a replacement platform under the Spine Futures initiative. In retail, integrated Riak KV Enterprise into its platform to manage product catalogs and caching, enabling faster re-platforming and handling high-traffic demands during peak shopping periods. In the gaming and betting industries, , one of Riak's largest users, employed it for real-time betting data storage and session management, processing daily data loads to ensure seamless user experiences during high-stakes events. Following 's 2017 acquisition of Basho Technologies, the developer of Riak, the company continued to support and enhance the technology for its operations, committing to open-source the enterprise version while maintaining its use in production systems as of 2025. Other adopters include , which used Riak for storing payment transactions and game data in its mobile gaming ecosystem, supporting multi-datacenter replication for global scalability. Key use cases for Riak span high-traffic web applications, where it serves as a robust session store and caching layer to handle millions of concurrent users with sub-millisecond latency. In IoT scenarios, Riak TS facilitates time-series data ingestion and querying, enabling real-time analytics on sensor streams at scales of hundreds of thousands of writes per second. Riak CS addresses needs for backups and large-scale file distribution, providing S3-compatible interfaces for hybrid cloud environments. Success stories highlight Riak's scalability in demanding workloads; for instance, achieved reliable performance under peak loads exceeding daily millions of operations, while reported improved response times through distributed caching. In finance, it supports transaction logging with options to mitigate risks; in media, it powers content delivery networks for video and asset storage; and in gaming, it manages leaderboards and player events with low-latency replication across regions. Post-2017 acquisition by , while some organizations explored migrations to alternatives like or DynamoDB due to Basho's challenges, Riak maintains sustained adoption in legacy systems, particularly where in multi-site deployments remains critical, as evidenced by ongoing use at as of 2025. An open-source , OpenRiak, initiated in 2024, ensures continued community-driven development.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.