The End of the Shared-Nothing Era

Author

Jeff Denworth

An era can be said to end when its basic illusions are exhausted.
Arthur Miller

Today, we announced version 3 of VAST Data Platform's architecture, the world’s first Disaggregated and Shared-Everything (DASE) storage system. This new release combines VAST’s paradigm-changing storage foundation with enterprise features such as support for Windows and Mac environments, enterprise-grade encryption and cloud replication… establishing VAST Data Platform as suitable for a wide range of enterprise requirements and thereby triggering the generational shift beyond the shared-nothing architectures, such as Dell EMC Isilon, of yesterday.

At the turn of the century, shared-nothing storage architectures were instrumental in helping organizations scale beyond what was then possible with scale-up storage and to manage the explosion of data and the need for simple and scalable storage capacity. These systems were the perfect answer to the limited dual-controller, small-volume storage systems that came before them... and helped customers achieve breakthrough levels of speed, resilience and scale.

The impact of the adoption of shared-nothing storage was nothing less than tectonic, where this influence can be felt by many popular storage technologies deployed today, including:

much of the world’s web and cloud storage infrastructure such as AWS and Dropbox
file systems: products like Dell EMC Isilon and Pure Storage’s FlashBlade
object storage: solutions such as CEPH and IBM Cleversafe
big data architectures such as Apache Hadoop and Splunk
in virtualization, with products such as Nutanix and vSAN
even in modern backup appliances such as Rubrik and Commvault Hyperscale

For 20 years, billions of dollars of infrastructure has been deployed as shared-nothing clusters. Where these products have succeeded, their success has also cast a light on the problems they have not solved. With each exabyte deployed, the initial objectives of Shared-nothing storage cluster architectures become even more confused. Let’s look at these at a high level:

Performance Scalability

The Premise: Shared-Nothing clusters are easy to scale performance by just adding more nodes (cpus + disk) into a cluster.

The Reality: Because each cluster node has to coordinate data and metadata activities with other cluster nodes, the cross talk relating to cluster coherency and storage rebuilds has limited the effective performance scale to only a few dozen nodes. As systems grow larger, so does the chatter within the cluster. Most commercial scale-out storage appliances don’t scale beyond a few dozen nodes before the law of diminishing performance returns limits customers from seeing linear performance gains.

Capacity Scalability

The Premise: Shared-nothing products eliminated the islands of storage created by dual-controller architectures by aggregating storage across nodes, making it simple to add capacity by adding more nodes.

The Reality: The tight coupling of Shared-nothing storage to a node’s CPUs and shared-nothing’s focus on HDDs has resulted in pools and tiers of infrastructure that are not globally scalable. Data becomes constrained within a tier and volumes, directories and data management operations typically never occur across these pools. As a result, there is still data segmentation and inefficiency within shared-nothing systems even though they were developed to eliminate islands of infrastructure.

The existence of storage tiering in classic shared-nothing architectures still forces customers to wrestle with the capacity & performance sizing problems these systems were designed to solve against.

Multi-Tenant Cloud Storage Infrastructure

The Premise: Shared-nothing storage systems are designed with concepts that are in use by the world’s leading cloud services, and are designed to host the requirements of many applications at the same time.

The Reality: Because shared-nothing systems broadcast the I/O requests made to any node across multiple nodes within a cluster, powerful applications can and do inflict significant pain on multi-tenant environments. For this reason, many organizations will deploy different shared-nothing clusters for different users or different purposes… thereby creating the islands of storage these systems were designed to eliminate.

‘Global’ Storage Logic

The Premise: “One”. One namespace, one simple management experience, one set of algorithms that scale across the cluster. Everything is simple, scalable and efficient.

The Reality: The tight coupling of CPUs to disks has resulted in a limit in the scale of storage algorithms that have, at their core, limited these products from being able to appropriately address the needs of modern applications:

No system was built to globally buffer writes and manage wear across 1,000s of low-endurance, high-capacity SSDs - forcing vendors to use expensive flash
Erasure codes stripes don’t scale beyond 10-20 data blocks, resulting in overhead that is often measured at 20-40% of the cumulative cluster capacity
No shared-nothing system was ever built with a global data reduction dictionary because they required a tight coupling of data reduction indexes to the DRAM of a node. Storage vendors have therefore had to wrestle with scale because each copy of the index across cluster nodes becomes prohibitively expensive.

The resulting outcome: shared-nothing flash solutions are too expensive for broad adoption and still cost 20x the cost of HDD storage, forcing customers to continue to trade performance vs capacity

Simplicity

The Premise: Scaling is easy just by adding nodes of CPU and disk… both performance and capacity grow linearly such that you don’t need to choose one or the other.

The reality: By tightly coupling CPUs, RAM and disks in a node architecture, shared-nothing clusters have created a Cambrian explosion of node types as customers are forced to balance against a tradeoff between IOPS, throughput and capacity needs. Most shared-nothing products feature 5-10 different node options, which forces customers to make hard coded and compromised decisions on how they’d like to size performance and capacity… where they then must hope that they don’t need more performance for their capacity in the future because the capacity they purchase becomes hostage to a specific CPU.

This problem can result in, at times, comical outcomes… where the most famous example of the complexity of shared-nothing node choices has resulted in a 46,763 page pricelist.

The promise of shared-nothing has been delivered, and the world has benefited. At the same time, we now have no illusions about what these systems can and cannot do.

Every new beginning comes from some other beginning’s end. Seneca

As we’ve discussed at length, the emergence of new foundational storage technologies such as NVMe-over-Fabrics, Storage Class Memory and low-endurance, low-cost flash media has made it possible to reimagine what storage can be in the modern era. VAST Data is pioneering the next-generation of disaggregated and shared-everything (DASE) storage to break many of the tradeoffs and compromises that have emerged through the global adoption of shared-nothing architectures.

This blog is already too long to walk through all of the DASE benefits in great detail, so I’ll do my best to summarize the key architectural differences via a table:

	Shared-Nothing Storage	DASE Storage
Performance Scalability	Cluster cross-talk limits the practical performance and performance scaling starts to tail off after a few dozen nodes	Every CPU independently mounts each piece of media in a cluster. DASE systems eliminate east-west cluster traffic to deliver linear scale
Capacity Scalability	Volumes of data are locked into specific pools of HW that each have their own data management and data protection boundaries	VAST Data’s asymmetric cluster architecture breaks down the barriers between multiple generations of storage and pools them into one large resource group that eliminates data boundaries and preserves the balance of capacity and performance as you scale
Multi-Tenancy	Customers are still building separate clusters to meet the diverse and competing performance needs of applications	With the elimination of east-west traffic, it then becomes easy to segment the CPUs of the cluster into dynamically scalable pools that can be allocated to demanding applications and tenants and isolate traffic between competing applications while still all serving them from one global storage system
Global Logic	Shared-nothing systems are still using limited logic around managing volumes of data, narrow data protection stripes and limited data reduction… the result is often not much more different than what’s achievable with dual-controller designs.	VAST’s exabyte-scale DASE architecture features global QLC flash management, global data protection and global data reduction that all combine to help dramatically recalibrate the effective acquisition costs of flash - making it possible to afford flash for all of your data and to eliminate the HDD from the data center
Simplicity	Customers are struggling with a variety of different storage node makes and models that they can select from to build out their storage clusters. With SSD options that cost 10x-20x what their HDD tiers cost, tiered storage systems further compound the complexity of how to size and scale storage and leaves HDD-based data on slow, legacy media.	CPUs are able to be scaled independently of capacity, resulting in only one server model and one enclosure model per each generation of HW, where customers can right-size infrastructure to their requirements and can add performance without adding more storage capacity Furthermore, when it’s possible to afford flash for all of your data, storage doesn’t need to anymore be sized for performance or capacity.

And finally, a few questions…

As VAST Data is increasingly part of the global discourse on the future of storage, it’s important for the fundamental differences between our architecture and previous generations of storage to be well understood. Storage concepts can often become very technical and the storage companies can at times make it difficult to see forest from the trees. To make it simple, here’s a few questions to ask of Dell and other storage vendors to understand their relevance as the shared-nothing era winds down.

Has your vendor innovated around flash and delivered a level of storage efficiency that makes it possible to afford the total cost of acquisition of a true all-flash data center?
Does your vendor still advertise complex tiered storage architectures that force you to continually assess the value of performance vs the cost of capacity?
Is your current storage architecture able to scale performance and capacity linearly, without introducing any cluster cross-talk that would diminish performance returns at scale?
Is it possible to safely consolidate high-performance and heterogeneous applications onto a single storage cluster, where the IO from any power-user’s application can be fenced off from every other application?
Lastly, is scale-out really simple if there are a multitude of node types to configure and combine into a shared-nothing cluster? If you guess wrong and under-size performance…what is the mechanism to correct this decision? What is the cost of over-estimating performance with a vendor’s expensive high-performance flash nodes?

If you’ve read this far, thank you for considering this food for thought and welcome to this new beginning. It’s closing time for the shared-nothing era. While Seneca said it best, Semisonic sang it even better…