VAST DataBase

Turning The Tables On Databases

Meet the revolutionary VAST Database. A continuation of our first-principles thinking. Is it a database? Is it a data warehouse? Is it a data lake? Yes. Allow us to explain…

The VAST DataBase has broken fundamental database tradeoffs to combine the transactional performance of a database, the query performance of an exabyte-scalable data warehouse at the cost of a data lake.

Breaking the Tradeoffs Between Transactions and Queries

VAST systems leverage deep write buffers built from low-cost persistent memory, this allows for every ACID transaction to be stored instantaneously.

As tables fill, they are then migrated down to low-cost hyperscale flash and stored in a columnar format, so that queries also run instantaneously.

Queries cut across both the long-term datastore and the write buffer, and even though data accessed in the buffer is row-based, the underlying persistent memory structures make any row reads lightning fast.

A Columnar Data Format That Thrives on Flash

While Parquet may be the leading data science file format in use today, systems that use Parquet make inefficient use of column store infrastructure.At 32KB, VAST’s DataBase chunk size is 16,000x smaller than your average Parquet row group. By embracing the idea of an all-flash data lake, we’ve made it possible to achieve incredible levels of query filtration and reduce the number of records that query engines sift through.At 32KB, the VAST DataBase columnar payload is also simple to update. Customers can now immediately update tables for everything from GDPR requests to retention policies enforcement without legacy database vacuuming headaches. It’s all just… fast.

images
images
Performance Comparison

The VAST DataBase is great for finding needles in haystacks

Let’s look for rides that have over $100 of tolls in the NYC Taxi dataset. With the same row count in both tests, S3 took 8.11 seconds requiring Trino to process 28 million rows, while the VAST DataBase took 2.27 seconds requiring Trino to only process 2 rows.

images
Use Cases

Purpose built for all your data.

Content Recommendation

By enabling real-time queries all the way down to the archive, the VAST Database enables content producers, e-commerce sites and social networks to query user interest profiles and to real-time and train new ML models.

Payment Fraud Analytics

The VAST DataBase transforms fraud analytics by combining the transactional performance of a database with the scalable query performance of a data lake. By breaking the tradeoffs between a database and a data warehouse, the VAST DataBase enables payments providers to analyze and detect fraud in real time.

Targeted Advertising

The VAST DataBase is used by leading advertisers and advertising networks to develop more targeted advertising techniques by mapping and correlating user behavior. VAST’s efficiency algorithms create all-flash data lakes with archive economics, ideal for optimizing ad network P&L.

Homeland Security

The VAST DataBase brings the ability to perform fine-grained queries all the way down to the archive. The platform is ideal for government agencies who struggle to find needles in haystacks… now, these needles can be found in real-time at exabyte-scale.

Linearly scale consistent database services across 1000s of CPUs.

VAST’s new Disaggregated and Shared-Everything Architecture is designed to break the conventional scaling limits of distributed systems. We call it DASE.

In DASE clusters, the machines that run database logic are stateless and have been disaggregated from the flash that is stitched together on a low-latency commodity data center fabric. We’ve invented a new shared-everything data structure that makes it possible for each CPU to write into the namespace without having to coordinate with any other CPU.

The parallelism of the DASE architecture makes it possible to build systems that can transact in millions of records per second and query from an exabyte-scalable volume of flash with near-infinite query performance

images

A revolutionary approach to database compression.

It’s almost impossible to find the right balance of file sizing when dealing with open formats like Parquet and ORC. Big files put less of a toll on metastores and also typically see better compression, but they also force query engines to sift through more records and decompress more data than ever needed by a query. The VAST DataBase leverages VAST’s next-generation approach to data reduction which compresses columnar chunks globally against each other. This global compression approach is called Similarity-Based Data Reduction – and eliminates the data engineering hassle of sizing files in your data lake. Every columnar chunk is added to a global compression cluster that achieves greater savings than is ever possible with single-file approaches like Snappy. Similarity is so powerful, it’s capable of finding reduction on pre-reduced and even encrypted data. We guarantee that you’ll never find a more effective approach to finding savings.

To learn more about Similarity, visit here.
Importers and Query Interfaces

The VAST DataBase Embraces Open Data Science Standards

The VAST DataBase uniquely combines an exabyte scale namespace for natural data types like images, video, LIDAR, genomes, and other rich, real-world data sources, along with a tabular database to hold the catalog of expanding metadata about the objects generated as data works its way through the deep learning pipeline.

images

The First Synthesized Structured & Unstructured Data Platform

Just like modern AI applications create structure from unstructured data, the VAST Data Platform has been designed to power all your unstructured and structured data applications.

VAST’s Multi-Protocol DataStore was introduced in 2019 as the world’s first file and object storage system that combined the performance of all-flash with the economics of an archive. This system is a multi-protocol data management system that serves data from any view (NFS, SMB & S3).

With the VAST DataBase, unstructured data gets more than a data catalog – the DataBase’s transactional and analytical capabilities lay the foundation for the semantic layer of AI training and inference systems.

images
Features

Breaking the Tradeoffs Between Transaction Systems & Deep Analytics

Scalable ACID Transactions

The VAST DataBase provides support for unlimited ACID transactions and atomic updates within and across tables in this system.

A Partition-Less Architecture

Each DataBase server sees the same volume of NVMe SSDs, making it possible to scale linearly without any diminishing returns.

Columnar Queries

The VAST DataBase converts rows into columnar objects as they age, making the dataset suitable for flash-optimized deep queries.

Global Data Reduction

VAST’s Similarity-Based data reduction combines the global nature of deduplication with the fine granularity of compression across your entire global namespace.

Massive Performance

Scale to over 1 million transactions per second. Scale to terabytes/second of query throughput.

Massive Scale

VAST clusters can be built to support well over an exabyte of data capacity. Today, several customers run clusters over 100PB in size

Hassle-Free Table Management

No need for compaction, data vacuuming, or partition management – the VAST DataBase is always fast and manages table cleanup for you.

Disaster Recovery

The VAST DataBase supports n:1 and 1:n asynchronous replication topologies, and couples replication with 15 second recovery points to make failover near-real-time.

Disaggregated Architecture

The CPUs that run VAST DataBase logic are independent of the machines that hold the system’s state, making it easy to scale clusters using flexible topologies.

Optimized for Low-Cost Flash

VAST clusters have introduced a new data structure that’s optimized for the particular nuance of QLC and PLC flash, making it possible to dramatically lower the cost of an all-flash data lake.

Global Snapshots for Time Travel

VAST clusters use write-in-free-space semantics to make snapshots painless. It’s easy to snapshot one table or many tables consistently, making it simple to remove the complexity of time travel.

Support for Complex Data Types

UINT 8/16/32/64, INT 8/16/32/64, BOOL, FLOAT 32/64, DATE32,, TIMESTAMP, TIME 32/64, STRING, DECIMAL128, BINARY, BINARY32KB, ARRAY, MAP, COUNT (including nested and multi-level nested data)

Access Policies

The VAST DataBase applies S3-style bucket policies to table permissions management, a cloud-native approach to access control.

Integrated File System

VAST is the only DataBase to integrate with a parallel, POSIX file namespace and S3 namespace, enabling content to be merged with the context layer.

Data Importer

The VAST DataBase can be loaded via the VAST RESTful API, a S3 Bucket (automatic Parquet ETL), Trino, Spark

Scalable ACID Transactions

The VAST DataBase provides support for unlimited ACID transactions and atomic updates within and across tables in this system.

A Partition-Less Architecture

Each DataBase server sees the same volume of NVMe SSDs, making it possible to scale linearly without any diminishing returns.

Columnar Queries

The VAST DataBase converts rows into columnar objects as they age, making the dataset suitable for flash-optimized deep queries.

Global Data Reduction

VAST’s Similarity-Based data reduction combines the global nature of deduplication with the fine granularity of compression across your entire global namespace.

Massive Performance

Scale to over 1 million transactions per second. Scale to terabytes/second of query throughput.

Massive Scale

VAST clusters can be built to support well over an exabyte of data capacity. Today, several customers run clusters over 100PB in size

Hassle-Free Table Management

No need for compaction, data vacuuming, or partition management – the VAST DataBase is always fast and manages table cleanup for you.

Disaster Recovery

The VAST DataBase supports n:1 and 1:n asynchronous replication topologies, and couples replication with 15 second recovery points to make failover near-real-time.

Disaggregated Architecture

The CPUs that run VAST DataBase logic are independent of the machines that hold the system’s state, making it easy to scale clusters using flexible topologies.

Optimized for Low-Cost Flash

VAST clusters have introduced a new data structure that’s optimized for the particular nuance of QLC and PLC flash, making it possible to dramatically lower the cost of an all-flash data lake.

Global Snapshots for Time Travel

VAST clusters use write-in-free-space semantics to make snapshots painless. It’s easy to snapshot one table or many tables consistently, making it simple to remove the complexity of time travel.

Support for Complex Data Types

UINT 8/16/32/64, INT 8/16/32/64, BOOL, FLOAT 32/64, DATE32,, TIMESTAMP, TIME 32/64, STRING, DECIMAL128, BINARY, BINARY32KB, ARRAY, MAP, COUNT (including nested and multi-level nested data)

Access Policies

The VAST DataBase applies S3-style bucket policies to table permissions management, a cloud-native approach to access control.

Integrated File System

VAST is the only DataBase to integrate with a parallel, POSIX file namespace and S3 namespace, enabling content to be merged with the context layer.

Data Importer

The VAST DataBase can be loaded via the VAST RESTful API, a S3 Bucket (automatic Parquet ETL), Trino, Spark

Consumption Model

Sold as software, delivered and supported as an appliance.

Meet Gemini – the business of storage, disaggregated. With Gemini, customers purchase managed software on hardware that can now be bought directly from our manufacturers at cost. Gemini provides customers more commercial flexibility and new ways to save on software storage solutions - all while delivering unrivaled levels of scale-out deployment simplicity.

On-Premise, VAST appliances are designed to find the optimized balance of performance and capacity. We collaborate with leading enterprise technology manufacturers to specify resilient, scalable, and efficient equipment. Our scalable Cluster architecture allows for mix and match across generations of flash and storage compute infrastructure.

Learn more about supported platforms.

LightSpeed
Ceres
HPE
Mercury
Cloud
In partnership with Sanmina, LightSpeed is a 2U HA modern flash enclosure, born in the era of scalable AI. LightSpeed combines the light touch of a scale-out NAS with the speed of a parallel file system to deliver on the promise of simplicity at scale.
View Specs
In partnership with Sanmina, LightSpeed is a 2U HA modern flash enclosure, born in the era of scalable AI. LightSpeed combines the light touch of a scale-out NAS with the speed of a parallel file system to deliver on the promise of simplicity at scale.
View Specs