Last Updated: April 02, 2020
VAST Data’s Universal Storage redefines the economics of flash storage, making flash affordable for all applications, from the highest performance databases to the largest data archives, for the first time. The Universal Storage concept blends game-changing storage innovations to lower the acquisition cost of flash with an exabyte-scale file and object storage architecture breaking decades of storage tradeoffs.
With the advantage of new, enabling technologies that weren’t available before 2018, this new Universal Storage concept can achieve a previously-impossible architecture design point. The system combines low-cost QLC Flash Drives and 3D XPoint memory (such as Intel Optane) with stateless, containerized storage services all connected over new low-latency NVMe over Fabrics networks to create VAST’s Disaggregated Shared Everything (DASE) scale-out architecture. Next-generation global algorithms are applied to this DASE architecture to deliver new levels of storage efficiency, resilience, and scale.
While the architecture concepts are sophisticated, the intent and vision of Universal Storage are simple: to bring an end to the data center HDD era and end the complexity of storage tiering that is a byproduct of the decades of compromises caused by mechanical media. This White Paper will introduce you to the VAST Data’s Universal Storage and the DASE architecture and explain how this new architecture defies all conventional definitions of storage. In breaking the classic price/performance tradeoff, this system features all-flash performance at archive economics to simplify the data center and accelerate all modern applications.
Why Universal Storage?
The Tyranny of Tiers
Over 30 years ago, Gartner introduced the storage tiering model as a means to optimize data center costs by advising customers to deprecate older and less-valuable data to lower-cost (and slower) tiers of storage. Fast forward 30 years and the sprawl of storage technologies within organizations has grown to unmanageable proportions – where many of the world’s largest companies can be found managing dozens of different types of storage. This problem is exhibited when defining both storage class (for example: all-flash, hybrid, all-HDD, tape) as well as by classes of protocols (block, file, object, big data, etc.)…. all of it creates a complex pyramid of storage technologies.
While the savings are clear when applying this model with legacy storage architectures, the idea that data should exist on a specific storage tier according to its current value creates multiple challenges:
The Demands of Artificial Intelligence Render Storage Tiering Obsolete
Arguably the greater problem with storage tiering is that this concept assumes that the applications accessing data enjoy a narrow and predefined view of their data access requirements. While that’s true for some applications, such as traditional database engines, new game-changing AI and analytics tools, such as machine learning and deep learning, see value in all data and want the fastest access to the largest amounts of data. For example, when a deep learning system trains it’s neural network model for facial recognition, the model becomes more accurate only once it’s run against all the photos in the dataset, not just the 15-30% that may fit in some expensive flash tier. The value these applications bring is proportionate to the corpus of data they get exposed to, where they thrive with large data sets.
Defining Universal Storage
Universal Storage is a next-generation, scale-out file and object storage concept that breaks decades of storage tradeoffs, and in so doing defies classical storage definitions. Universal Storage is:
New Technologies Lay A New Storage Foundation
There are points in time where the introduction of new technologies make it possible to rethink fundamental approaches to system architecture. In order to realize the Universal Storage architecture vision, VAST made a bet on a trio of underlying technologies that were not available to previous storage architecture efforts, and in fact, only all became commercially viable in 2018. These are:
For the first time in 30 years, a new type of media has been introduced into the classic media hierarchy. 3D XPoint is a new persistent memory technology that is both lower-latency and more endurant than the NAND flash memory used in SSDs while retaining flash’s ability to retain data without external power persistently.
Universal Storage systems use 3D XPoint both as a high-performance write buffer to enable the deployment of low-cost QLC flash for the system’s data store, as well as a global metadata store. 3D XPoint was selected for its low write latency and long endurance. A Universal Storage cluster includes tens to hundreds of terabytes of 3D XPoint capacity, which provides the VAST DASE architecture with several architectural benefits:
NVMe over Fabrics
NVMe (Non-Volatile Memory express) is the software interface that replaced the SCSI command set for accessing PCIe SSDs. Greater parallelism and lower command queue overhead make NVMe SSDs significantly faster than their SAS or SATA equivalents.
NVMe over Fabrics (NVMe-oF) extends the NVMe API over commodity Ethernet and Infiniband networks to provide PCI levels of performance for remote storage access at data center scale. VAST’s DASE architecture disaggregates CPUs and connects them to a globally accessible pool of 3D Xpoint and QLC Flash SSDs to enable a system architecture that independently scales controllers from storage and provides the foundation to execute a new class of global storage algorithms with the intent of driving the effective cost of the system below the sum of the cost of goods. With NVMe-oF, VAST Containers enjoy the advantage of statelessness and shared-everything access to a global pool of 3D XPoint and Flash, with direct-attached levels of storage access performance.
The logic of VAST’s Universal Storage cluster runs in stateless containers. Thanks to NVMe-oF and NVMe Flash and 3D XPoint, each container enjoys direct-attached levels of storage performance without having any direct-attached stateful storage. Containers make it simple to deploy and scale VAST as a software-defined microservice while also laying the foundation for a much more resilient architecture where container failures are non-disruptive to system operation.
Quad-Level Cell Flash (QLC) is the fourth and latest generation in flash memory density and therefore costs the least to manufacture. QLC stores 33% more data in the same space than Triple-Level Cell (TLC). Each cell in a QLC flash chip stores four bits, requiring 16 different voltage levels.
While QLC brings the cost per GB of flash down to unprecedentedly low levels, squeezing more bits in each cell comes with a cost. As each successive generation of flash chips reduced cost by fitting more bits in a cell, each generation also had lower endurance, wearing out after fewer write/erase cycles. The differences in endurance across flash generations are huge – while the first generation of NAND (SLC) could be overwritten 100,000 times, QLC endurance is 100x lower.