A clean-slate approach allowed us to create the first new scalable system architecture in the last 20 years.

The Beginning

Let’s start (nearly) at the beginning

20 years ago, Google published the seminal Google File System whitepaper detailing a new system architecture now described as “shared-nothing”. Shared-nothing systems scale by organizing a series of independent commodity storage servers to create high-capacity and high-performance namespaces.​

Since the introduction of Google’s architecture, $100Bs of scalable storage, database and hyperconverged systems have been designed in its likeness. While this concept solved the scalability challenges of the early 2000s, it has also introduced significant compromise at a time when customers need so much more from scalable systems.​​

Shared-Nothing server failures are painful.

Yesterday’s direct-attached architecture coped well with server failures and small HDDs. But today, with nearly petabyte-sized servers and growing SSDs, server failures are increasingly painful, with significant operational and data loss risks.

Rigid and inflexible performance profiles.

Data access is determined by CPU and storage. Archive nodes remain slow, while performance nodes are costly and low-capacity. Scaling CPUs independently of data capacity avoids isolated or data islands or “tiers”.

D
A
S
E
DASE Introduction

Meet DASE: The Storage Architecture of the AI Era

Conceived 20 years after Google introduced the idea of shared-nothing systems, VAST’s DASE architecture decouples compute logic from system state and introduces new shared and transactional data structures that (in combination) lay the foundation for the next generation of AI-infused computing.​

DASE fuses capacity with performance, data with rich metadata, edge with cloud and simplicity with scale. Data and systems concepts that were previously mutually exclusive now co-exist harmoniously in a platform that IDC hails as ’the architecture of the future’.

bg

Disaggregation begins by leveraging next-generation commodity networking.

Commodity networks now enable the ability to separate cluster CPUs from SSDs, all while providing processors the ability to access NVMe devices as if they were directly attached. With DASE, stateless servers are deployed on low-latency Ethernet or InfiniBand NVMe fabrics and can be scaled to build data clusters that can support exabytes of data and 10,000s of processors.

CPUs scale independently of SSDs

DASE enables clusters to scale the platform’s data capacity independently of the CPUs that run the logic of a VAST cluster. Once processing has been disaggregated from the system state, you only need to buy capacity when you need capacity and only buy stateless CPUs when you only need more cluster performance.

Asymmetric System Scaling

The DASE Datastore architecture makes it possible to non-disruptively scale from petabytes to exabytes across multiple generations of hardware in the cloud, or on-prem as you build-out an always-on flash cloud. Users and applications never have to worry about which generation of infrastructure their data is on. Platform hardware considerations are entirely abstracted away.

DASE Enables Consolidation

By eliminating traffic between VAST Containers, the DASE architecture not only achieves a revolutionary level of I/O parallelism, but it also allows systems to compose the presentation layer into multiple pools of containers that present independent mountpoints on a network.

Container pools can be used for isolating traffic between multiple computing grids or can be used to home VAST systems on multiple independent physical network fabrics. With VAST, container pools don’t just make consolidation possible… consolidation is now practical.

Meet the VAST Element.

A DASE Element is a data unit (file, object, table, function, trigger etc.) in the VAST Element Store. This enriched data object is designed with ACID transactional semantics and is accessible to all of the logic in a disaggregated DASE cluster. Because each VAST container can see shared state over a next-generation network, no two processors need to talk with each other at all during read or write operations. This shared-everything approach is key to breaking the tradeoff between scale and transactional performance in large clusters.

Elements are enriched with rich metadata such that any VAST container can access its metadata, security data, data reduction data, etc. all without going to some other part of the cluster.

Flexible data access, by design.

Data is never defined by the protocol or language that wrote it; this rich data structure allows any application to access data in the manner it’s most accustomed to, without requiring data copies across data boundaries. Files can be accessed as objects, functions can be accessed as files, tables are accessible as objects and more… all accessible via a flexible and well-designed security abstraction that accommodates for differences between access APIs.

Flexible, point-in-time data management.

The Element’s atomic data structures use a rich set of pointers to maintain a journal of cluster consistency. This write-in-free-space system makes it possible to take byte-granular snapshots at any level of frequency and not subject applications to the performance overhead or capacity overhead that is associated with legacy snapshot solutions.

Up to 1 Million Snapshots.

The DASE architecture supports a nearly-unlimited ability to take and store consistency points. Single clusters support up to 1 million snapshots, snapshots that can be reserved at any level of data hierarchy depth and can be reserved instantaneously.

VAST’s flexible snapshot engine is the basis for fine-grained data replication, data cataloging and the ability to time travel consistently across DataBase tables.

A Parallel Transactional Architecture

DASE data structures are ACID transactional and globally visible across a shared fabric to all of the stateless containers in parallel.

The new approach to managing state makes it possible to eliminate the classic east-west traffic associated with distributed system locking, cache management and metadata coherency. Without east-west traffic, DASE clusters can scale limitlessly to support scalable deep learning, HPC and database transactions at any scale.

The VAST DataSpace: Our Global Namespace

Historically, geo-distributed systems trade local performance for global consistency - which makes multi-site computing nearly impossible for organizations that don’t want to deal with the data management problem. With VAST, clusters come together in a federation of systems that create one global data platform that extends from edge to cloud all to present a global high-performance namespace wherever you compute.

Performance Where You Need It When You Need It

The DataSpace idea is to bend space and time by empowering remote sites to work together using decentralized data management, global data buffering and web-scale consistency concepts. By eliminating the need for classic locking primitives of distributed systems and allowing locking to flow with the transactions in a decentralized manner, the VAST DataSpace is able to overcome the performance tradeoffs of WAN-connected infrastructure. Data is fast wherever it’s accessed.

A Policy-Based Data Fabric

Any VAST cluster can instantly participate in a global cluster by peering to any other site and sharing a system path, allowing sites to flexibly subscribe to interesting datasets.

Performance Where and When You Need It

The Constellation idea is to bend space and time by empowering remote sites to work together using decentralized data management and web-scale consistency concepts. By eliminating the need for classic locking primitives of distributed systems and allowing locking to flow with the transactions in a decentralized manner, Constellation clusters are able to overcome the performance tradeoffs of WAN-connected infrastructure and edge nodes serve as the basis for cloud burst computing.

A Platform Designed for Hyperscale Flash

VAST’s Datastore was designed to exploit the cost savings potential of low-cost, next-generation flash devices and leverages deep I/O buffers to intelligently classify and pipeline data into cost-effective hyperscale NVMe devices in a manner which avoids write amplification.

VAST has solved core endurance challenges, making it possible to field hyperscale QLC devices with a 10 year warranty. This enables infrastructure buyers to avoid constant hardware refresh events while leveraging the lowest-cost flash they can consume.

Data protection shouldn’t come at a price.

Conventional storage systems with erasure coding today use a class of error correction codes developed in 1960 by mathematicians. A system using Reed-Solomon encoding can calculate two, three or even more ECC strips to protect a set of data strips.

Wide stripes create problems.

The problem with Reed-Solomon codes is that reconstructing an unreadable data strip requires the data from all the surviving data strips in order to rebuild the data. This means for wide write stripes, the time to rebuild can be very long to recover lost blocks.

As a result, storage architects have to trade-off the greater efficiency and read performance of data protection stripes against the need to maintain fast rebuild rates, which only get worse as drives get larger and larger. When time-to-rebuild complications compound against failure probability, ultra-wide stripes are simply not possible with conventional RAID methods.

Revolutionary Erasure Code Efficiency

VAST Data’s Locally Decodable Erasure Code (LDEC) is a new approach to error correction that breaks the tradeoff between efficiency and resilience, providing customers with 60 million years of mean-time-to-data-loss (MTTDL) while enabling industry-low error correction overhead of as little as 2.7%.

A Breakthrough In Device Recovery Speed

VAST’s new error correction algorithms use a new Locally-Decodable approach to data recreation, allowing rebuilds to happen across wide stripes without requiring the code to read through all the data in a stripe. The result: data can be rebuilt in a fraction of the time of legacy approaches, enabling wide and efficient write stripes to help save customers money without compromising platform integrity.

A Breakthrough In Global Data Reduction

For the first time in 20 years, VAST is introducing a new data reduction system that combines the global benefits associated with deduplication and the fine-granularity of local compression. VAST’s Similarity-Based Data Reduction is designed to find efficiency where other systems cannot, and makes it possible to even reduce data that has already been compressed.

The World’s Best Data Reduction Engine

Similarity makes it possible to find correlations across similar blocks in a VAST cluster, and then applies byte-range compression between blocks that have common content. Using semantically-optimized compression methods, VAST clusters can achieve the best efficiency of any platform, a capability we guarantee.

Platform Efficiency, Redefined.

Imagine a platform that can reduce even pre-compressed and pre-deduplicated data. Customers now see amazing results:

  ● 2x the efficiency of leading backup appliances

  ● reduction on media, genomes, binaries and more

  ● a system that can even reduce encrypted data

Similarity is Game-Changing.

Similarity delivers a weighted average of 3:1 data reduction across our global fleet of VAST clusters, and our algorithms are improving daily. Similarity is instrumental in helping customers move beyond the HDD era and realize unprecedented infrastructure cost, power and data center efficiency.

Data Center-Scale NVMe Fabric
Data Element
Access Controls Protocols Reduction Encryption
05102022:14:05:00.0
02172019:04:22:40.2
11022007:20:45:03.9
09272004:02:33:17.5
Los Angeles New York City London Mumbai Brisbane
All-Flash
Hybrid Flash + HDD Storage
HDD Storage
$ / GB
Guaranteed
10 years of longevity
A Simpler Approach

One tier of flash, from your working set all the way to the archive. Infrastructure is now AI-Ready.

VAST’s efficiency, scale, resilience and QOS make it practical to consolidate infrastructure down to one tier of simple-to-manage data.  By pioneering new levels of efficiency, VAST eliminates the cost arguments for tiered infrastructure… making it possible to provide real-time access to all your data.

Customers considering either refreshing or deploying new unstructured data storage platforms should not make a decision without looking at VAST Data. Its performance, availability, scalability, economic benefits, and continued triple-digit growth rates make a compelling argument that this is the storage architecture of the digitally transformed future.

Trusted by the world’s leading data-driven organizations
View All Customers