Software-defined storage
View on WikipediaSoftware-defined storage (SDS) is a marketing term for computer data storage software for policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage typically includes a form of storage virtualization to separate the storage hardware from the software that manages it.[1] The software enabling a software-defined storage environment may also provide policy management for features such as data deduplication, replication, thin provisioning, snapshots, copy-on-write clones, tiering and backup.
Software-defined storage (SDS) hardware may or may not also have abstraction, pooling, or automation software of its own. When implemented as software only in conjunction with commodity servers with internal disks, it may suggest software such as a virtual or global file system or distributed block storage. If it is software layered over sophisticated large storage arrays, it suggests software such as storage virtualization or storage resource management, categories of products that address separate and different problems. If the policy and management functions also include a form of artificial intelligence to automate protection and recovery, it can be considered as intelligent abstraction.[2] Software-defined storage may be implemented via appliances over a traditional storage area network (SAN), or implemented as network-attached storage (NAS), or using object-based storage. In March 2014 the Storage Networking Industry Association (SNIA) began a report on software-defined storage.[3]
Software-defined storage industry
[edit]VMware used the marketing term "software-defined data center" (SDDC) for a broader concept wherein all the virtual storage, server, networking and security resources required by an application can be defined by software and provisioned automatically.[4][5] Other smaller companies then adopted the term "software-defined storage", such as Cleversafe (acquired by IBM), and OpenIO.
Based on similar concepts as software-defined networking (SDN),[6] interest in SDS rose after VMware acquired Nicira for over a billion dollars in 2012.
Data storage vendors used various definitions for software-defined storage depending on their product-line. Storage Networking Industry Association (SNIA), a standards group, attempted a multi-vendor, negotiated definition with examples.[7]
The software-defined storage industry is projected to reach $86 billion by 2023.[8]
Building on the concept of VMware, esurfing cloud has launched a new software-defined storage product called HBlock. HBlock is a lightweight storage cluster controller that operates in user mode. It can be installed on any Linux operating system as a regular application without root access, and deployed alongside other applications on the server. HBlock integrates unused disk space across various servers to create high-performance and highly available virtual disks. These virtual disks can be mounted to local or other remote servers using the standard iSCSI protocol, revitalizing storage resources on-site without impacting existing operations or requiring additional hardware purchases.[9]
Characteristics
[edit]Characteristics of software-defined storage may include the following features:[10]
- Abstraction of logical storage services and capabilities from the underlying physical storage systems, and in some cases pooling across multiple different implementations. Since data movement is relatively expensive and slow compared to computation and services, pooling approaches sometimes suggest leaving it in place and creating a mapping layer to it that spans arrays. Examples include:
- Storage virtualization, the generalized category of approaches and historic products. External-controller based arrays include storage virtualization to manage usage and access across the drives within their own pools. Other products exist independently to manage across arrays and/or server DAS storage.
- Virtual volumes (VVols), a proposal from VMware for a more transparent mapping between large volumes and the VM disk images within them, to allow better performance and data management optimizations. This does not reflect a new capability for virtual infrastructure administrators (who can already use, for example, NFS) but it does offer arrays using iSCSI or Fibre Channel a path to higher admin leverage for cross-array management apps written to the virtual infrastructure.
- Parallel NFS (pNFS), a specific implementation which evolved within the NFS community but has expanded to many implementations.
- OpenStack and its Swift, Ceph and Cinder APIs for storage interaction, which have been applied[by whom?] to open-source projects as well as to vendor products.
- A number of Object Storage platforms are also examples of software-defined storage implementations.
- Number of distributed storage solutions for clustered file system or distributed block storage are good examples of software defined storage.
- Automation with policy-driven storage provisioning with service-level agreements replacing technology details. This requires management interfaces that span traditional storage-array products, as a particular definition of separating "control plane" from "data plane", in the spirit of OpenFlow. Prior industry standardization efforts included the Storage Management Initiative – Specification (SMI-S) which began in 2000.
- Commodity hardware with storage logic abstracted into a software layer. This is also described[by whom?] as a clustered file system for converged storage.
Storage hypervisor
[edit]In computing, a storage hypervisor is a software program which can run on a physical server hardware platform, on a virtual machine, inside a hypervisor OS or in the storage network. It may co-reside with virtual machine supervisors or have exclusive control of its platform. Similar to virtual server hypervisors a storage hypervisor may run on a specific hardware platform, a specific hardware architecture, or be hardware independent.[11]
The storage hypervisor software virtualizes the individual storage resources it controls and creates one or more flexible pools of storage capacity. In this way it separates the direct link between physical and logical resources in parallel to virtual server hypervisors. By moving storage management into isolated layer it also helps to increase system uptime and High Availability. "Similarly, a storage hypervisor can be used to manage virtualized storage resources to increase utilization rates of disk while maintaining high reliability."[12]
The storage hypervisor, a centrally-managed supervisory software program, provides a comprehensive set of storage control and monitoring functions that operate as a transparent virtual layer across consolidated disk pools to improve their availability, speed and utilization.
Storage hypervisors enhance the combined value of multiple disk storage systems, including dissimilar and incompatible models, by supplementing their individual capabilities with extended provisioning, data protection, replication and performance acceleration services.
In contrast to embedded software or disk controller firmware confined to a packaged storage system or appliance, the storage hypervisor and its functionality spans different models and brands and types of storage [including SSD (solid state disks), SAN (storage area network) and DAS (direct attached storage) and Unified Storage(SAN and NAS)] covering a wide range of price and performance characteristics or tiers. The underlying devices need not be explicitly integrated with each other nor bundled together.
A storage hypervisor enables hardware interchangeability. The storage hardware underlying a storage hypervisor matters only in a generic way with regard to performance and capacity. While underlying "features" may be passed through the hypervisor, the benefits of a storage hypervisor underline its ability to present uniform virtual devices and services from dissimilar and incompatible hardware, thus making these devices interchangeable. Continuous replacement and substitution of the underlying physical storage may take place, without altering or interrupting the virtual storage environment that is presented.
The storage hypervisor manages, virtualizes and controls all storage resources, allocating and providing the needed attributes (performance, availability) and services (automated provisioning, snapshots, replication), either directly or over a storage network, as required to serve the needs of each individual environment.
The term "hypervisor" within "storage hypervisor" is so named because it goes beyond a supervisor,[13] it is conceptually a level higher than a supervisor and therefore acts as the next higher level of management and intelligence that sits above and spans its control over device-level storage controllers, disk arrays, and virtualization middleware.
A storage hypervisor has also been defined as a higher level of storage virtualization [14] software, providing a "Consolidation and cost: Storage pooling increases utilization and decreases costs. Business availability: Data mobility of virtual volumes can improve availability. Application support: Tiered storage optimization aligns storage costs with required application service levels".[15] The term has also been used in reference to use cases including its reference to its role with storage virtualization in disaster recovery[16] and, in a more limited way, defined as a volume migration capability across SANs.[17]
Server vs. storage hypervisor
[edit]An analogy can be drawn between the concept of a server hypervisor and the concept of a storage hypervisor. By virtualizing servers, server hypervisors (VMware ESX, Microsoft Hyper-V, Citrix Hypervisor, Linux KVM, Xen, z/VM) increased the utilization rates for server resources, and provided management flexibility by de-coupling servers from hardware. This led to cost savings in server infrastructure since fewer physical servers were needed to handle the same workload, and provided flexibility in administrative operations like backup, failover and disaster recovery.
A storage hypervisor does for storage resources what the server hypervisor did for server resources. A storage hypervisor changes how the server hypervisor handles storage I/O to get more performance out of existing storage resources, and increases efficiency in storage capacity consumption, storage provisioning and snapshot/clone technology. A storage hypervisor, like a server hypervisor, increases performance and management flexibility for improved resource utilization.
See also
[edit]References
[edit]- ^ Margaret Rouse. "Definition: software-defined storage". SearchSDN. Tech Target. Retrieved November 7, 2013.
- ^ Chris Poelker (March 12, 2014). "The foundation of clouds: Intelligent abstraction".
- ^ SNIA (March 2014). "Technical Whitepaper:Software Defined Storage".
- ^
Archana Venkatraman. "Software-defined datacentres demystified". Computer Weekly. TechTarget. Retrieved November 7, 2013.
The term software-defined datacentre (SDDC) rose to prominence this year during annual virtualisation conference VMworld 2012 [...] A software-defined datacentre is an IT facility where the elements of the infrastructure - networking, storage, CPU and security - are virtualised and delivered as a service. The provisioning and operation of the entire infrastructure is entirely automated by software.
- ^ "The Software-Defined Data Center". company web site. VMware. Retrieved November 7, 2013.
- ^ Margaret Rouse. "Definition: software-defined storage". SearchSDN. Tech Target. Retrieved November 7, 2013.
- ^ "Technology Focus Areas | SNIA".
- ^ "Thriving software-defined-storage market will ramp up to $86B by 2023: report". FierceTelecom. 20 March 2020. Retrieved 2020-03-23.
- ^ "第二期观点|天翼云存储资源盘活系统 HBlock,全面释放企业数据价值_云计算_InfoQ精选文章". www.infoq.cn. Retrieved 2024-04-16.
- ^ Simon Robinson (March 12, 2013). "Software-defined storage: The reality beneath the hype". Computer Weekly. Retrieved November 7, 2013.
- ^ "Comparison of virtualization technologies".
- ^ Snyder, Brett; Ringenberg, Jordan; Green, Robert; Devabhaktuni, Vijay; Alam, Mansoor (June 9, 2014). "Evaluation and design of highly reliable and highly utilized cloud computing systems". Journal of Cloud Computing. 4: 12. doi:10.1186/s13677-015-0036-6. S2CID 17909593.
- ^ "Hypervisor glossary definition" (PDF). Xen v2.0 for x86 Users' Manual (PDF). Xen.org on August 20, 2011. Archived from the original (PDF) on October 5, 2011. Retrieved October 4, 2017.
- ^ "SearchStorage.com definition". What is storage virtualization? Definition on SearchStorage.com.
- ^ IBM SmartCloud Virtual Storage Center. IBM Redbooks. 6 March 2015. ISBN 9780738440439.
- ^ Erickson, Todd (June 23, 2011). "SearchDisasterRecovery Article". SearchDisasterRecovery.com. Archived from the original on October 4, 2017. Retrieved October 4, 2017.
- ^ Mearian, Lucas (November 23, 2010). "ComputerWorld Article". Archived from the original on October 4, 2017. Retrieved October 4, 2017.
Software-defined storage
View on GrokipediaOverview
Definition
Software-defined storage (SDS) is a storage architecture that uses software to manage and abstract data storage resources across diverse hardware platforms, decoupling storage management from the underlying physical hardware.[1][5] This approach allows storage functions such as provisioning, protection, and scaling to be handled through software rather than being tied to proprietary hardware controllers.[6] At its core, SDS operates on principles of software control over storage provisioning, scalability, and automation, frequently utilizing commodity hardware to enhance cost-efficiency and adaptability.[2] These principles enable dynamic allocation of resources based on policies, supporting elastic growth without hardware-specific constraints.[7] SDS differs from broader concepts like software-defined infrastructure (SDI), which virtualizes and manages computing, storage, and networking resources holistically, by concentrating exclusively on the storage domain to optimize data handling independently.[8] By abstracting heterogeneous storage environments—such as combining solid-state drives, hard disk drives, and cloud-based tiers—SDS facilitates unified management through a centralized software layer, promoting interoperability and simplified administration.[1][9]Historical Development
The concept of software-defined storage (SDS) originated in the early 2010s, building on the momentum of server virtualization trends pioneered by VMware, which demonstrated the benefits of abstracting compute resources from hardware to enable scalability in data centers.[10] This shift was driven by the growing demand for flexible, cost-effective storage solutions to support the rapid expansion of cloud computing environments, where traditional hardware-bound storage struggled to meet dynamic scaling needs.[11] Early discussions around SDS emphasized decoupling storage software from proprietary hardware, allowing deployment on commodity servers to reduce costs and improve agility.[3] Key milestones in SDS development occurred between 2011 and 2013, marking its transition from conceptual idea to practical implementation. In 2012, OpenStack introduced Cinder as its block storage service in the Folsom release (September 2012), providing an open-source framework for managing persistent storage volumes in cloud infrastructures and exemplifying early SDS principles through API-driven provisioning.[12] The Storage Networking Industry Association (SNIA) formalized a definition of SDS in 2013 during its Storage Developer's Conference, describing it as virtualized storage platforms with service-level management interfaces that enable self-service provisioning across heterogeneous hardware.[13] These developments laid the groundwork for SDS as a distinct paradigm, distinct from prior storage virtualization efforts. SDS evolved through distinct phases, beginning with a primary focus on block storage in the early 2010s to address enterprise needs for high-performance, low-latency access in virtualized environments.[14] By the mid-2010s, adoption expanded to include file and object storage protocols, with solutions like Ceph integrating unified support for block, file, and object interfaces to handle unstructured data growth in distributed systems.[15] Entering the 2020s, SDS began incorporating edge computing capabilities, enabling decentralized storage management for IoT and remote workloads while maintaining central policy control.[16] The growth of SDS was significantly propelled by the explosion of big data and widespread cloud adoption in the 2010s, as organizations required scalable storage to process vast datasets without hardware lock-in.[1] In the 2020s, advancements have centered on AI-optimized SDS architectures tailored for data lakes, incorporating features like automated tiering and intelligent data placement to support machine learning workloads on massive, unstructured repositories.[17]Core Concepts
Abstraction and Virtualization
In software-defined storage (SDS), abstraction refers to the process by which software layers decouple storage management functions from the underlying physical hardware, presenting storage resources as a unified logical pool to applications and users. This abstraction hides hardware-specific details, such as RAID configurations, vendor-specific protocols, and physical device characteristics like IOPS, throughput, latency, and capacity, allowing administrators to manage storage without direct interaction with proprietary hardware features.[18][19][20] Virtualization in SDS builds on this abstraction by aggregating disparate storage resources—such as hard disk drives (HDDs), solid-state drives (SSDs), and cloud-based storage—into a single, cohesive namespace that appears as a contiguous entity. Techniques like storage pooling enable the creation of this virtual layer, where capacity from heterogeneous devices is combined and dynamically allocated based on demand, while dynamic tiering automatically migrates data between storage tiers (e.g., from high-performance SSDs to cost-effective HDDs) to optimize performance and efficiency without manual intervention.[21][1][22] Access to this abstracted and virtualized storage is facilitated through standardized protocols that provide a consistent interface, independent of the underlying hardware. Common protocols include block-level access via iSCSI for high-performance applications, file-level sharing through NFS for collaborative environments, and object-based APIs such as S3-compatible interfaces for scalable, unstructured data storage.[21][23] These mechanisms deliver significant flexibility by eliminating hardware lock-in, enabling non-disruptive data migrations across environments, and supporting seamless scaling of capacity and performance as needs evolve. For instance, organizations can add or reallocate resources without downtime, adapting to workload changes while maintaining data availability and integrity.[1][24]Policy-Based Management
Policy-based management in software-defined storage (SDS) refers to a rule-driven automation framework that enables administrators to define and enforce policies for storage operations, including data placement, replication, and quality of service (QoS) enforcement, independent of underlying hardware.[1] This approach provides a unified control plane for aligning storage capabilities with application requirements, allowing dynamic provisioning without manual reconfiguration.[25] By leveraging predefined rules, it automates decision-making processes that traditionally required human intervention, enhancing efficiency in heterogeneous environments.[26] Key elements of policy-based management include policies for data mobility, such as automatic tiering of hot and cold data to optimize performance and cost; for instance, rules can migrate frequently accessed data to faster storage tiers while archiving inactive data to slower, cheaper media.[11] Security policies incorporate encryption rules to protect data at rest and in transit, ensuring compliance with standards like GDPR or HIPAA by applying uniform safeguards across storage pools.[27] Compliance-focused policies handle retention schedules, automatically enforcing data lifecycle management to meet regulatory requirements, such as immutable storage for audit trails or automated deletion after predefined periods.[28] In practice, SDS systems serve as underlying storage backends or provisioners in orchestration platforms like Kubernetes, integrating via Container Storage Interface (CSI) drivers. In Kubernetes, StorageClasses define provisioning parameters such as QoS levels and rules but are not the storage mechanism itself, enabling containerized applications to request storage with specific policy attributes such as IOPS limits or replication factors.[29][30] Examples include Ceph's CRUSH algorithm, which uses tunable maps and rules to govern data placement and replication strategies across cluster topologies, and VMware vSAN's Storage Policy-Based Management (SPBM), which defines capabilities like fault tolerance and object space reservation for virtual machine disks.[31][32] The automation outcomes of policy-based management significantly reduce manual intervention by enabling self-service provisioning, where users can deploy storage resources via declarative policies without administrator approval, streamlining operations in enterprise environments.[33] This leads to faster response times for workload scaling and lower operational costs, as routine tasks like backup scheduling and access controls are handled programmatically, minimizing errors and resource underutilization.[11] In large-scale deployments, such automation supports agile IT practices, allowing organizations to adapt storage configurations dynamically to changing demands while maintaining consistency and reliability.[34]Architecture
Key Components
Software-defined storage (SDS) systems are composed of core and supporting components that enable the abstraction of storage resources from underlying hardware, allowing for flexible, policy-driven management. At a high level, these components form a distributed architecture that separates management functions from data handling, ensuring scalability and resilience in diverse environments.[21][35] The primary core components include the control plane, data plane, and metadata services. The control plane serves as the centralized management layer, responsible for orchestration, provisioning, policy enforcement, and resource allocation across the storage infrastructure. It provides a service management interface that automates tasks such as configuration and scaling, often through graphical user interfaces or programmatic access, to simplify administration and meet application requirements.[21][35][23] In contrast, the data plane handles the actual input/output operations, including reading, writing, and processing data on storage nodes. It virtualizes the data path to support efficient data movement, applying services like replication, deduplication, and compression directly at the node level for performance and integrity. This separation from the control plane allows the data plane to operate independently, distributing workloads across commodity hardware to optimize throughput.[21][36][35] Metadata services track data locations, attributes, and policies, maintaining an index of where data resides within virtual pools. These services, often integrated into the control plane, enable quick lookups and ensure data accessibility in distributed setups, supporting features like tiering and migration without disrupting operations.[21][36][35] Supporting elements enhance integration and observability. APIs facilitate programmatic interaction, enabling automation and interoperability with ecosystems like OpenStack or VMware through standards such as RESTful interfaces and protocols including S3 for object storage.[21][23] Monitoring tools provide real-time visibility into health, performance, and usage via dashboards and analytics, allowing administrators to detect issues and optimize resources proactively.[21][36] Multi-protocol interfaces support block (e.g., iSCSI), file (e.g., NFS, SMB), and object access, ensuring compatibility with varied applications and workloads.[21][37] Scalability is inherent in the distributed architecture, which supports horizontal scaling by adding nodes without downtime, pooling resources for virtually unlimited capacity—such as up to 8 yottabytes in some implementations. Fault tolerance is achieved through mechanisms like replication (mirroring data across nodes) and erasure coding (distributing data slices with parity for recovery, e.g., tolerating up to 5 node failures in a 12-slice setup), minimizing data loss risks.[21][23][36] These components interact closely for cohesive operation: the control plane directs policies to the data plane via metadata updates, coordinating I/O requests and ensuring fault-tolerant data placement across nodes. This interdependency enables dynamic resource adjustment, where monitoring feedback informs control plane decisions, maintaining overall system efficiency and reliability.[21][38][37]Storage Hypervisor
A storage hypervisor is a software layer that virtualizes and abstracts physical storage resources from disparate hardware vendors, pooling them into a unified, logical storage pool to enable efficient management and utilization in software-defined storage (SDS) environments.[39] Unlike general-purpose virtualization tools, it is specifically optimized for I/O-intensive operations by handling high-throughput data access patterns, such as those in enterprise databases and virtualized workloads, through features like intelligent caching and low-latency protocols.[40] This abstraction allows administrators to treat heterogeneous storage arrays—spanning SAN and NAS systems—as a single virtual entity, decoupling applications from underlying hardware dependencies.[41] Key functionalities of a storage hypervisor include resource pooling across diverse storage infrastructures, thin provisioning to allocate storage on-demand without overcommitting physical capacity, and the creation of snapshots and clones for rapid data replication and recovery.[41] These capabilities support multi-tenancy in cloud environments by isolating tenant data within shared pools while ensuring performance isolation and scalability.[1] For instance, thin provisioning minimizes initial storage allocation, dynamically expanding as data grows, which optimizes utilization in dynamic SDS setups. Snapshots enable point-in-time copies for backup or testing without disrupting primary operations, enhancing data resilience in multi-tenant scenarios.[42] At the technical level, storage hypervisors integrate data efficiency techniques such as deduplication to eliminate redundant blocks, compression to reduce data footprint, and caching to accelerate read/write operations using faster tiers like flash.[41] These processes occur at the hypervisor layer to maintain consistent performance across virtualized resources. Protocols like NVMe-oF (NVMe over Fabrics) further enable high-speed abstraction by extending NVMe's low-latency interface over networks, supporting disaggregated storage in SDS architectures with sub-millisecond response times.[43] The evolution of storage hypervisors traces back to early proprietary implementations in the 2000s, such as IBM's SAN Volume Controller (SVC), which began development in 2000 based on research from IBM's Almaden lab and was commercially released in 2003 as a block storage virtualization appliance.[44] Initially focused on SAN environments, SVC evolved to incorporate advanced features like automated tiering and data reduction, achieving widespread adoption for heterogeneous storage management.[45] Open-source alternatives, such as Ceph, provide distributed storage virtualization and pooling capabilities for SDS environments.[46]Comparisons
With Traditional Storage
Traditional storage systems are predominantly hardware-centric, relying on dedicated storage area network (SAN) arrays such as EMC Symmetrix, which integrate specialized controllers, disks, and firmware into proprietary appliances managed through vendor-specific tools.[47][48] These systems emphasize tightly coupled hardware and software, where storage functionality is embedded within the physical infrastructure, limiting interoperability and requiring specialized expertise for configuration and maintenance.[23] In contrast, software-defined storage (SDS) adopts a software-centric, hardware-agnostic approach that decouples storage intelligence from the underlying hardware, enabling deployment on commodity servers and drives.[49] This shift reduces capital expenditures (CapEx) by leveraging inexpensive, off-the-shelf components rather than proprietary hardware, potentially lowering total cost of ownership (TCO) through avoided vendor premiums.[50] Traditional systems, however, suffer from tight hardware-software coupling, which inflates costs and enforces dependency on specific vendors for upgrades and support.[51] Operationally, traditional storage involves manual provisioning processes, where administrators configure resources array by array, leading to inefficiencies and errors in siloed environments that hinder resource sharing across applications.[24] SDS introduces automation for provisioning and management, allowing dynamic allocation from unified pools that scale elastically without physical reconfiguration.[1] This addresses the scalability limitations of traditional setups, where capacity expansions are constrained by array-specific silos and require downtime or additional hardware purchases.[23] The transition to SDS is driven by legacy challenges in traditional storage, including vendor lock-in that restricts multi-vendor environments and elevates TCO through proprietary maintenance contracts and inflexible scaling.[52] High TCO arises from ongoing hardware refresh cycles and specialized management overhead, prompting organizations to adopt SDS for greater agility and cost predictability.[53]Server Hypervisors vs. Storage Hypervisors
Server hypervisors, such as VMware ESXi and Microsoft Hyper-V, primarily focus on compute virtualization by abstracting physical CPU and RAM resources to enable the creation and management of multiple virtual machines (VMs) on a single physical server.[39] These systems provide isolation between VMs and efficient resource allocation for processing tasks, but their handling of storage is limited to basic attachment of virtual disks to VMs, often relying on underlying physical storage without advanced pooling or optimization across diverse devices.[54] This approach consumes VM resources for storage operations and offers limited scalability for dynamic I/O demands, making it suitable mainly for low-scale or ephemeral storage needs.[54] In contrast, storage hypervisors are specialized software layers designed for I/O optimization and storage abstraction, treating diverse physical disks and drives—such as SSDs, HDDs, SAN, NAS, or DAS—as a unified pool of virtual resources for shared access across systems.[55] They enable features like policy-driven provisioning, snapshots, replication, and storage quality of service (QoS) to prioritize and guarantee I/O performance levels, which are typically absent or rudimentary in server hypervisors.[55] By decoupling storage management from hardware specifics, storage hypervisors facilitate efficient utilization and service-level management in software-defined storage (SDS) environments.[56] Key differences between server and storage hypervisors lie in their scope, performance characteristics, and integration patterns. Server hypervisors target compute resources, introducing minimal overhead for CPU and memory operations but potentially higher latency in storage I/O due to their non-specialized handling of disk access.[54] Storage hypervisors, however, are engineered for storage-specific optimizations, such as dynamic resource balancing and reduced contention in shared pools, often resulting in lower latency and better overall throughput for data-intensive workloads.[57] In terms of integration, server hypervisors frequently operate atop storage hypervisors, leveraging the latter's abstracted storage layer to provide VMs with virtualized disks while avoiding direct hardware dependencies.[54] These distinctions enable synergies when combining server and storage hypervisors, particularly in hyper-converged infrastructure (HCI) setups, where they support unified management of compute and storage resources through a single interface.[58] In HCI, the server hypervisor orchestrates VM workloads on top of the storage hypervisor's pooled resources, promoting scalability, resilience, and simplified administration without siloed hardware.[59] This integrated approach addresses traditional storage limitations by enabling software-defined flexibility across the data center stack.[56]Industry Landscape
Market Trends
The global software-defined storage (SDS) market was valued at USD 38.43 billion in 2023 and reached USD 46.05 billion in 2024, with projections indicating growth to exceed USD 50 billion in 2025 at a compound annual growth rate (CAGR) of 27.9% through 2030, driven primarily by accelerating cloud migration and the expansion of hybrid multi-cloud environments that demand scalable, flexible storage solutions.[60] This surge is fueled by the exponential increase in data generation from digital transformation initiatives, enabling organizations to optimize resource utilization and achieve greater data reliability across distributed infrastructures.[60] Key trends in the SDS market as of 2025 include the rising integration with hyper-converged infrastructure (HCI), exemplified by Nutanix-style architectures that consolidate compute, storage, and networking for simplified management in data centers.[61] Additionally, edge SDS deployments are gaining traction to support Internet of Things (IoT) applications, where localized storage processing reduces latency and bandwidth demands in remote or distributed environments.[62] AI and machine learning workloads are further propelling demand for intelligent caching mechanisms within SDS, which dynamically prioritize data access to enhance performance for high-velocity training and inference tasks.[63] Influencing factors include the ongoing shift toward all-flash arrays in SDS implementations, which provide superior speed and reliability for performance-intensive applications while reducing hardware dependencies.[62] Sustainability efforts are also prominent, with a focus on energy-efficient software optimizations that minimize power consumption in data centers through intelligent workload orchestration and resource allocation.[64] Regionally, North America holds a dominant position with approximately 37% of global revenue share in 2023, supported by the concentration of large-scale data centers and advanced cloud adoption.[60] Europe exhibits steady growth driven by regulatory emphasis on data security and cost-efficient storage in enterprise settings, while the Asia-Pacific region is experiencing rapid expansion due to widespread digital transformation and increasing SME investments in IT infrastructure.[60]Major Vendors and Solutions
Dell Technologies offers PowerStore, a unified, software-defined storage platform that delivers scalable all-flash NVMe storage for block, file, and container workloads, with features like AI-driven optimization and a guaranteed 5:1 data reduction ratio.[65] PowerStore emphasizes flexibility through its container-based architecture, supporting non-disruptive upgrades and integration with hybrid environments.[66] NetApp provides ONTAP as its flagship SDS operating system, which unifies data management across on-premises, hybrid, and multi-cloud setups, enabling seamless data mobility and policy-based automation. NetApp's strategy centers on hybrid cloud integration, positioning it as a leader for hybrid cloud storage use cases according to the 2025 Gartner Magic Quadrant for Enterprise Storage Platforms.[67] This approach differentiates ONTAP by supporting file, block, and object protocols while optimizing costs through efficient data tiering between flash and cloud storage.[68] Pure Storage's Purity operating system powers its all-flash arrays, focusing on high-performance, evergreen storage with non-disruptive upgrades and simplicity in management.[69] Pure's all-flash emphasis delivers low-latency performance for demanding workloads, achieving 99.9999% availability and positioning the company furthest in vision in the 2025 Gartner Magic Quadrant for Enterprise Storage Platforms.[70] This strategy prioritizes flash-optimized efficiency, reducing operational complexity compared to hybrid systems.[71] In the open-source domain, Red Hat Ceph provides a scalable, software-defined object storage solution that supports block, file, and object interfaces, leveraging commodity hardware for distributed storage clusters. Ceph's architecture enables high availability and self-healing, making it suitable for cloud-native environments. As an open-source foundation, it fosters community-driven innovation and integration with platforms like OpenStack. VMware vSAN integrates SDS directly into hyperconverged infrastructure (HCI), pooling local storage from industry-standard servers to create a shared datastore with policy-based management and high availability.[72] vSAN reduces total cost of ownership by over 30% through disaggregated scaling and efficient resource utilization, supporting up to 300,000 IOPS per node.[72] IBM Spectrum Virtualize serves as an enterprise-grade SDS solution, virtualizing storage across heterogeneous hardware to provide unified block and file services with advanced data reduction and replication.[73] It excels in large-scale deployments by enabling non-disruptive migrations and integration with IBM's cloud ecosystem, enhancing storage efficiency in hybrid setups.[74] HPE SimpliVity delivers a hyperconverged SDS platform focused on operational simplicity, combining compute, storage, and networking with built-in deduplication, compression, and policy-driven automation.[75] Its strategy emphasizes ease of management and data protection, reducing backup times and lowering TCO through integrated resiliency features.[75] The SDS ecosystem involves strategic partnerships among vendors, such as NetApp's collaborations with AWS and Azure for seamless hybrid cloud data services, and Pure Storage's integrations with VMware for HCI environments.[67] Many solutions comply with industry standards like the SNIA SDS Technical Assessment (SDS-TA), ensuring interoperability and multi-vendor compatibility in enterprise deployments.[76]Notable software-only SDS solutions
Software-defined storage solutions that are software-only (or primarily so) and designed to run on commodity hardware (standard x86 servers, off-the-shelf drives) provide flexibility, cost savings, and avoidance of vendor lock-in. Below are prominent examples:Open-source and enterprise-supported
- Ceph (with Red Hat Ceph Storage) — Massively scalable distributed system offering block, file (CephFS), and object storage; runs on commodity hardware with self-healing and CRUSH-based placement. Widely used in cloud-native, AI/ML, and large-scale environments.
- GlusterFS (with Red Hat Gluster Storage) — Scalable distributed file system that aggregates storage into a global namespace; simple deployment, suited for NAS workloads, cloud storage, and media.
- MinIO — High-performance, S3-compatible object storage; optimized for AI/ML, analytics, and private/hybrid clouds; supports single-node to distributed setups on commodity hardware.
- OpenZFS (used in TrueNAS CORE) — File system/volume manager with strong data integrity; enables reliable storage on commodity hardware.
- OpenEBS — Kubernetes-native container storage.
- LINBIT SDS (LINSTOR/DRBD) — Block storage for high availability and geo-clustering.
Commercial software-only
- DataCore (SANsymphony for block, Swarm for object) — High-performance with caching, tiering; runs on standard x86 servers.
- Scality RING — Integrated file and object storage; scales to petabytes on commodity x86 hardware; supports S3, NFS, Swift.
- Hammerspace — Global Data Environment with erasure coding; unifies data across storage types on commodity hardware.
- StarWind Virtual SAN — Creates shared pools from local disks; suits SMB/edge, supports multiple hypervisors.
- NetApp ONTAP Select — Brings ONTAP features (NAS/SAN) to commodity hardware.
- StorMagic SvSAN — Edge-focused, runs on minimal servers (2 nodes).
- Others include VDURA (parallel file for AI/HPC), Versity (archival), Lightbits (NVMe/TCP block), and more.