How It Works

Architecture Matters.

So many of the decisions organizations make today are underpinned by the limits of yesterday’s infrastructure thinking. ​​To tackle the needs of tomorrow’s intelligent applications, the VAST team challenges fundamental notions of data scale, cost, context and locality. The result is a new architecture that breaks fundamental tradeoffs to create a new type of AI-ready data platform that is both simple and cost effective at any scale.​

A clean-slate approach allowed us to create the first new scalable system architecture in the last 20 years.

The Beginning

Let’s start (nearly) at the beginning

20 years ago, Google published the seminal Google File System whitepaper detailing a new system architecture now described as “shared-nothing”. Shared-nothing systems organize a series of independent commodity storage servers into a larger namespace. Since then, $100Bs of scalable storage, database and hyperconverged systems have been designed in its likeness. While this concept solved the scalability challenges of the early 2000s, they’ve also introduced significant compromise.​

Shared-Nothing server failures are painful.

Yesterday’s direct-attached architecture coped well with server failures and small HDDs. But today, with nearly petabyte-sized servers and growing SSDs, server failures are increasingly painful, with significant operational and data loss risks.

Consistency fights against scale.

Shared-nothing systems need tight node coordination for data writing. But shared operations, cache management, locking, rebuilds, and cache coherency stress the limits of east-west traffic, making it tough for clusters to scale consistently.

Rigid and inflexible performance profiles.

Data access capability are determined by their CPU & storage devices. Archive nodes remain slow, and performance nodes are costly and low-capacity. Failing to scale the CPUs independent of data capacity results in isolated data islands known of as “tiers”.

DASE Introduction

Meet DASE: The Storage Architecture of the Future

Conceived 20 years after Google introduced the idea of shared-nothing systems, VAST’s DASE architecture decouples compute logic from system state and introduces new shared and transactional data structures that (in combination) lay the foundation for the next generation of AI-infused computing.​

DASE fuses capacity with performance, data with rich metadata, edge with cloud and simplicity with scale. Data and systems concepts that were previously mutually exclusive now co-exist harmoniously in a platform that IDC hails as ’the architecture of the future’.

VAST’s DASE is ‘the storage architecture of the future’


Disaggregation begins by leveraging next-generation commodity networking.

Commodity networks now enable the ability to separate cluster CPUs from SSDs, all while providing processors the ability to access NVMe devices as if they were direct attached. With DASE, stateless servers are deployed on low-latency Ethernet or InfiniBand NVMe fabrics and can be scaled to build data clusters that can support exabytes of data and 10,000s of processors.

CPUs scale independently of SSDs

DASE enables clusters to scale the platform’s data capacity independently of the CPUs that run the logic of a VAST cluster. Once processing has been disaggregated from the system state, you only need to buy capacity when you need capacity and only buy stateless CPUs when you only need more cluster performance.

Segment processors into QoS pools.

VAST’s DASE architecture eliminates the notion of system portioning, and therefore doesn’t experience the east-west scaling challenges of legacy scale-out architectures. With this new highly-parallel architecture, CPUs can be pooled into resource groups that can be allocated to competing applications or tenants… isolating traffic and solving the noisy-neighbor problem throughout your environment.

Pools also allow clusters to connect into multiple heterogeneous client networks and network technologies, creating one site-wide system to support any access method.

Asymmetric system scaling.

The DASE Datastore architecture makes it possible to non-disruptively scale from petabytes to exabytes across multiple generations of hardware in the cloud, or on-prem as you build-out an always-on flash cloud. Users and applications never have to worry about which generation of infrastructure their data is on. Platform hardware considerations are entirely abstracted away.

Introducing the VAST Data Element.

A DASE Element is a data unit (file, object, table, symlink, etc.) in the VAST Element Store. This enriched data object is designed with transactional semantics that can be articulated to any of the system’s processors in a disaggregated cluster. No two processors need to talk with each other at all during read or write operations, since all see the same shared state.

Elements are enriched with rich metadata such that any VAST container can access metadata, access controls and efficiencies without going to some other part of the cluster.

Multi-protocol, by design.

Data is never defined by the protocol that wrote it; this rich data model allows any application to access data in the manner it’s most accustomed to, without requiring data copies across different protocol boundaries. Files can be accessed as objects, can be accessed as tables, and more… all accessible via a flexible and well-designed security abstraction that accommodates for differences between access APIs.

Flexible, point-in-time data management.

The Element’s atomic data structures use a rich set of pointers to maintain a journal of cluster consistency. This write-in-free-space system makes it possible to take byte-granular snapshots at any level of frequency and not subject applications to the performance overhead or capacity overhead that is associated with legacy snapshot solutions.

Over 1 Million Snapshots

The DASE architecture supports a nearly-unlimited ability to take and store consistency points. Conventional limits are… gone. Single clusters support over 1 million snapshots that can be reserved at any level of data hierarchy depth and can be reserved instantaneously.

The Power of Shared-Everything

By inventing a data layer that is transactional and globally visible across a shared fabric to all of the stateless containers in parallel, it’s now possible to build distributed and consistent systems without the classic east-west traffic associated with locking or metadata coherency. This enables unlimited scale for any I/O profile.

Any VAST container can read or write consistently to any range within the entire namespace without coordinating with any other processor. Shared Elements eliminate the need to logically partition scale-out clusters, resulting in systems that are simply available and consistent.

Constellation: The VAST Global Namespace

Historically, geo-distributed systems trade local performance for global consistency - which makes multi-site computing nearly impossible for organizations that don’t want to deal with the data management problem. With VAST, clusters come together in a federation of systems that create one global data platform that extends from edge to cloud to present a global high performance namespace wherever you compute.

Local Performance Access, Global Distribution

With Constellation, users write into global access points within a federation of clusters, into paths that subsequently are published out to any other site in the Constellation.

A Policy-Based Data Fabric

Any VAST cluster can instantly participate in a Constellation by peering to any other site and sharing a system path, allowing sites to flexibly subscribe to interesting datasets.

Performance Where And When You Need It

The Constellation idea is to bend space and time by empowering remote sites to work together using decentralized data management and web-scale consistency concepts. By eliminating the need for classic locking primitives of distributed systems and allowing locking to flow with the transactions in a decentralized manner, Constellation clusters are able to overcome the performance tradeoffs of WAN-connected infrastructure and edge nodes serve as the basis for cloud burst computing.

A Platform Designed for Hyperscale Flash

VAST’s Datastore was designed to exploit the cost savings potential of low-cost, next-generation flash devices and leverages deep I/O buffers to intelligently classify and pipeline data into cost-effective hyperscale NVMe devices in a manner which avoids write amplification.

VAST has solved core endurance challenges, making it possible to field hyperscale QLC devices with a 10 year warranty. This enables infrastructure buyers to avoid constant hardware refresh events while leveraging the lowest-cost flash they can consume.

Data protection shouldn’t come at a price.

Conventional storage systems with erasure coding today use a class of error correction codes developed in 1960 by mathematicians. A system using Reed-Solomon encoding can calculate two, three or even more ECC strips to protect a set of data strips.

Wide stripes create problems.

The problem with Reed-Solomon codes is that reconstructing an unreadable data strip requires the data from all the surviving data strips in order to rebuild the data. This means for wide write stripes, the time to rebuild can be very long to recover lost blocks.

As a result, storage architects have to trade-off the greater efficiency and read performance, of data protection stripes against the need to maintain fast rebuild rates, which only get worse as drives get larger and larger. When time-to-rebuild complications compound against failure probability, ultra-wide stripes are simply not possible with conventional RAID methods.

Revolutionary Erasure Code Efficiency

VAST Data’s Locally Decodable Erasure Code (LDEC) is a new approach to error correction that breaks the tradeoff between efficiency and resilience, providing customers with 60 million years of mean-time-to-data-loss (MTTDL) while enabling industry-low error correction overhead of as little as 2.7%.

A Breakthrough In Device Recovery Speed

VAST’s new error correction algorithms use a new Locally-Decodable approach to data recreation, allowing rebuilds to happen across wide stripes without requiring the code to read through all the data in a stripe. The result: data can be rebuilt in a fraction of the time of legacy approaches, enabling wide and efficient write stripes to help save customers money without compromising platform integrity.

A Breakthrough In Global Data Reduction

For the first time in 20 years, VAST is introducing a new data reduction system that combines the global benefits associated with deduplication and the fine-granularity of local compression. VAST’s Similarity-Based Data Reduction is designed to find efficiency where other systems cannot, and makes it possible to even reduce data that has already been compressed.

The World’s Best Data Reduction Engine

Similarity makes it possible to find correlations across similar blocks in a VAST cluster, and then applies byte-range compression between blocks that have common content. Using semantically-optimized compression methods, VAST clusters can achieve the best efficiency of any platform, a capability we guarantee.

Platform Efficiency, Redefined.

Imagine a platform that can reduce even pre-compressed and pre-deduplicated data. Customers now see amazing results:

  ● 2x the efficiency of leading backup appliances

  ● reduction on media, genomes, binaries and more

  ● a system that can even reduce encrypted data

Similarity is Game-Changing.

Similarity delivers a weighted average of 3:1 data reduction across our global fleet of VAST clusters, and our algorithms are improving daily. Similarity is instrumental in helping customers move beyond the HDD era and realize unprecedented infrastructure cost, power and data center efficiency.

Data Center-Scale NVMe Fabric
Data Element
Access Controls Protocols Reduction Encryption
Los Angeles New York City London Mumbai Brisbane
Hybrid Flash + HDD Storage
HDD Storage
$ / GB
10 years of longevity
A simpler approach.

One tier of flash, from your working set all the way to the archive. Infrastructure is now AI-Ready.

VAST’s efficiency, scale, resilience and QOS make it practical to consolidate infrastructure down to one tier of simple-to-manage data.  By pioneering new levels of efficiency, VAST eliminates the cost arguments for tiered infrastructure… making it possible to provide real-time access to all your data.


Customers considering either refreshing or deploying new unstructured data storage platforms should not make a decision without looking at VAST Data. Its performance, availability, scalability, economic benefits, and continued triple-digit growth rates make a compelling argument that this is the storage architecture of the digitally transformed future.

Trusted by the world’s leading data-driven organizations
View All Customers
View All Customers