VAST DataBase

Turning The Tables On Databases

Meet the revolutionary VAST DataBase. A continuation of our first-principles thinking. Is it a database? Is it a data warehouse? Is it a data lake? Yes. Allow us to explain…

Download White Paper

Features

Resources

The VAST DataBase has broken fundamental database tradeoffs to combine the transactional performance of a database, the query performance of an exabyte-scalable data warehouse at the cost of a data lake.

Running queries is not cheap, and it scares people away from trying to query data. It's a bad combination—here’s all the cool data, but you really can’t touch it because it’s really expensive to look at.

Cybersecurity Company Engineer

Read Success Story

Breaking the Tradeoffs Between Transactions and Queries

VAST systems leverage deep write buffers built from low-cost persistent memory, this allows for every ACID transaction to be stored instantaneously.

As tables fill, they are then migrated down to low-cost hyperscale flash and stored in a columnar format, so that queries also run instantaneously.

Queries cut across both the long-term datastore and the write buffer, and even though data accessed in the buffer is row-based, the underlying persistent memory structures make any row reads lightning fast.

A Columnar Data Format That Thrives on Flash

While Parquet may be the leading data science file format in use today, systems that use Parquet make inefficient use of column store infrastructure.At 32KB, VAST’s DataBase chunk size is 16,000x smaller than your average Parquet row group. By embracing the idea of an all-flash data lake, we’ve made it possible to achieve incredible levels of query filtration and reduce the number of records that query engines sift through.At 32KB, the VAST DataBase columnar payload is also simple to update. Customers can now immediately update tables for everything from GDPR requests to retention policies enforcement without legacy database vacuuming headaches. It’s all just… fast.

Performance Comparison

The VAST DataBase is great for finding needles in haystacks

Let’s look for rides that have over $100 of tolls in the NYC Taxi dataset. With the same row count in both tests, S3 took 8.11 seconds requiring Trino to process 28 million rows, while the VAST DataBase took 2.27 seconds requiring Trino to only process 2 rows.

Use Cases

Purpose built for all your data.

Content Recommendation

By enabling real-time queries all the way down to the archive, the VAST DataBase enables content producers, e-commerce sites and social networks to query user interest profiles and to real-time and train new ML models.

Payment Fraud Analytics

The VAST DataBase transforms fraud analytics by combining the transactional performance of a database with the scalable query performance of a data lake. By breaking the tradeoffs between a database and a data warehouse, the VAST DataBase enables payments providers to analyze and detect fraud in real time.

Targeted Advertising

The VAST DataBase is used by leading advertisers and advertising networks to develop more targeted advertising techniques by mapping and correlating user behavior. VAST’s efficiency algorithms create all-flash data lakes with archive economics, ideal for optimizing ad network P&L.

Homeland Security

The VAST DataBase brings the ability to perform fine-grained queries all the way down to the archive. The platform is ideal for government agencies who struggle to find needles in haystacks… now, these needles can be found in real-time at exabyte-scale.

Linearly scale consistent database services across 1000s of CPUs.

VAST’s new Disaggregated and Shared-Everything Architecture is designed to break the conventional scaling limits of distributed systems. We call it DASE.

In DASE clusters, the machines that run database logic are stateless and have been disaggregated from the flash that is stitched together on a low-latency commodity data center fabric. We’ve invented a new shared-everything data structure that makes it possible for each CPU to write into the namespace without having to coordinate with any other CPU.

The parallelism of the DASE architecture makes it possible to build systems that can transact in millions of records per second and query from an exabyte-scalable volume of flash with near-infinite query performance

A revolutionary approach to database compression.

It’s almost impossible to find the right balance of file sizing when dealing with open formats like Parquet and ORC. Big files put less of a toll on metastores and also typically see better compression, but they also force query engines to sift through more records and decompress more data than ever needed by a query. The VAST DataBase leverages VAST’s next-generation approach to data reduction which compresses columnar chunks globally against each other. This global compression approach is called Similarity-Based Data Reduction – and eliminates the data engineering hassle of sizing files in your data lake. Every columnar chunk is added to a global compression cluster that achieves greater savings than is ever possible with single-file approaches like Snappy. Similarity is so powerful, it’s capable of finding reduction on pre-reduced and even encrypted data. We guarantee that you’ll never find a more effective approach to finding savings.

To learn more about Similarity, visit here.

Importers and Query Interfaces

The VAST DataBase Embraces Open Data Science Standards

The VAST DataBase uniquely combines an exabyte scale namespace for natural data types like images, video, LIDAR, genomes, and other rich, real-world data sources, along with a tabular database to hold the catalog of expanding metadata about the objects generated as data works its way through the deep learning pipeline.

The First Synthesized Structured & Unstructured Data Platform

Just like modern AI applications create structure from unstructured data, the VAST Data Platform has been designed to power all your unstructured and structured data applications.

VAST’s Multi-Protocol DataStore was introduced in 2019 as the world’s first file and object storage system that combined the performance of all-flash with the economics of an archive. This system is a multi-protocol data management system that serves data from any view (NFS, SMB & S3).

With the VAST DataBase, unstructured data gets more than a data catalog – the DataBase’s transactional and analytical capabilities lay the foundation for the semantic layer of AI training and inference systems.

The First Synthesized Structured & Unstructured Data Platform

Features

Breaking the Tradeoffs Between Transaction Systems & Deep Analytics

Scalable Design

Maximizing performance and flexibility across exabyte-scale systems.

Seamless DataBase Integration

The VAST DataBase is intrinsic to the VAST file system, making it possible to scale linearly without compromise.

Scalable ACID Transactions

The VAST DataBase provides support for unlimited ACID transactions and atomic updates within and across tables in this system.

Disaggregated Architecture

The CPUs that run VAST DataBase logic are independent of the machines that hold the system’s state, making it easy to scale clusters using flexible topologies.

Global Data Reduction

VAST’s Similarity-Based data reduction combines the global nature of deduplication with the fine granularity of compression across your entire global namespace.

Massive Performance & Scale

VAST clusters can be built to support over an exabyte of data capacity, millions of transactions, and terabytes/second of query throughput.

Hassle-Free Table Management

No need for compaction, data vacuuming, or partition management – the VAST DataBase is always fast and manages table cleanup for you.

Scalable Design

Maximizing performance and flexibility across exabyte-scale systems.

Seamless DataBase Integration

The VAST DataBase is intrinsic to the VAST file system, making it possible to scale linearly without compromise.

Scalable ACID Transactions

The VAST DataBase provides support for unlimited ACID transactions and atomic updates within and across tables in this system.

Disaggregated Architecture

The CPUs that run VAST DataBase logic are independent of the machines that hold the system’s state, making it easy to scale clusters using flexible topologies.

Global Data Reduction

VAST’s Similarity-Based data reduction combines the global nature of deduplication with the fine granularity of compression across your entire global namespace.

Massive Performance & Scale

VAST clusters can be built to support over an exabyte of data capacity, millions of transactions, and terabytes/second of query throughput.

Hassle-Free Table Management

No need for compaction, data vacuuming, or partition management – the VAST DataBase is always fast and manages table cleanup for you.

Secure Operations

Ensure continuity and control with features like robust replication, access audits, and snapshot management.

Disaster Recovery

The VAST DataBase supports n:1 and 1:n asynchronous replication topologies, and couples replication with 15 second recovery points to make failover near-real-time.

Audit and Access

The VAST DataBase enables directly querying the “who,” “what,” and “how” of cluster and object access, enabling a cloud-native approach to audit and access policies.

Global Snapshots

VAST clusters use write-in-free-space semantics to make snapshots painless. It’s easy to snapshot one table or many tables consistently, making it simple to remove the complexity of time travel.

Secure Operations

Ensure continuity and control with features like robust replication, access audits, and snapshot management.

Disaster Recovery

The VAST DataBase supports n:1 and 1:n asynchronous replication topologies, and couples replication with 15 second recovery points to make failover near-real-time.

Audit and Access

The VAST DataBase enables directly querying the “who,” “what,” and “how” of cluster and object access, enabling a cloud-native approach to audit and access policies.

Global Snapshots

VAST clusters use write-in-free-space semantics to make snapshots painless. It’s easy to snapshot one table or many tables consistently, making it simple to remove the complexity of time travel.

Management Efficiency

Streamlined data management for complex workloads.

Columnar Queries

The VAST DataBase converts rows into columnar objects as they age, making the dataset suitable for flash-optimized deep queries.

Optimized for Low-Cost Flash

VAST clusters have introduced a new data structure that’s optimized for the particular nuance of QLC and PLC flash, making it possible to dramatically lower the cost of an all-flash data lake.

Support for Complex Data Types

UINT 8/16/32/64, INT 8/16/32/64, BOOL, FLOAT 32/64, DATE32, TIMESTAMP, TIME 32/64, STRING, DECIMAL128, BINARY, BINARY32KB, ARRAY, MAP, COUNT and vectors (including nested and multi-level nested data).

Integrated File System

VAST is the only database to integrate with a parallel, POSIX file namespace and S3 namespace, enabling content to be merged with the context layer.

Data Importer

The VAST DataBase can be loaded via the VAST RESTful API, an S3 Bucket (automatic Parquet ETL), Trino, Spark and direct upload of Parquet files via the VAST GUI.

Management Efficiency

Streamlined data management for complex workloads.

Columnar Queries

The VAST DataBase converts rows into columnar objects as they age, making the dataset suitable for flash-optimized deep queries.

Optimized for Low-Cost Flash

VAST clusters have introduced a new data structure that’s optimized for the particular nuance of QLC and PLC flash, making it possible to dramatically lower the cost of an all-flash data lake.

Support for Complex Data Types

Integrated File System

VAST is the only database to integrate with a parallel, POSIX file namespace and S3 namespace, enabling content to be merged with the context layer.

Data Importer

The VAST DataBase can be loaded via the VAST RESTful API, an S3 Bucket (automatic Parquet ETL), Trino, Spark and direct upload of Parquet files via the VAST GUI.

Consumption Model

Sold as software, delivered and supported as an appliance.

Meet Gemini – the business of storage, disaggregated. With Gemini, customers purchase managed software on hardware that can now be bought directly from our manufacturers at cost. Gemini provides customers more commercial flexibility and new ways to save on software storage solutions - all while delivering unrivaled levels of scale-out deployment simplicity.

On-Premise, VAST appliances are designed to find the optimized balance of performance and capacity. We collaborate with leading enterprise technology manufacturers to specify resilient, scalable, and efficient equipment. Our scalable Cluster architecture allows for mix and match across generations of flash and storage compute infrastructure.

Learn more about supported platforms.

LightSpeed

Ceres

Supermicro

HPE

Cisco

In partnership with Sanmina, LightSpeed is a 2U HA modern flash enclosure, born in the era of scalable AI. LightSpeed combines the light touch of a scale-out NAS with the speed of a parallel file system to deliver on the promise of simplicity at scale.
View Specs

Resources

Innovation begins with understanding

View All

Video

The VAST DataBase Explained

Watch as we breakdown just how the VAST DataBase is able to apply structure to unstructured data at any scale, breaking tradeoffs between transaction systems and deep analytics.

23:06

White Paper

VAST DataBase - Performance & Benchmarking

Discover how VAST DataBase transforms data pipelines with AI-optimized performance, surpassing benchmarks for speed, efficiency, and scalability.

24 pages

White Paper

The VAST Data Platform Explained

Learn how VAST's revolutionary DASE architecture defies all conventional definitions of data platforms, delivering all-flash performance at archive economics to simplify the data center and accelerate all modern applications.

Download

Breaking the Tradeoffs Between Transactions and Queries

A Columnar Data Format That Thrives on Flash

The VAST DataBase is great for finding needles in haystacks

Purpose built for all your data.

Content Recommendation

Payment Fraud Analytics

Targeted Advertising

Homeland Security

Linearly scale consistent database services across 1000s of CPUs.

A revolutionary approach to database compression.

The VAST DataBase Embraces Open Data Science Standards

The First Synthesized Structured & Unstructured Data Platform

Scalable Design​

Seamless DataBase Integration​

Scalable ACID Transactions​​

Disaggregated Architecture​​

Global Data Reduction​​

Massive Performance & Scale​​

Hassle-Free Table Management​​

Scalable Design​

Seamless DataBase Integration​

Scalable ACID Transactions​​

Disaggregated Architecture​​

Global Data Reduction​​

Massive Performance & Scale​​

Hassle-Free Table Management​​

Secure Operations​

Disaster Recovery​​

Audit and Access​

Global Snapshots​

Secure Operations​

Disaster Recovery​​

Audit and Access​

Global Snapshots​

Management Efficiency​

Columnar Queries​​

Optimized for Low-Cost Flash​​

Support for Complex Data Types​​

Integrated File System

Data Importer

Management Efficiency​

Columnar Queries​​

Optimized for Low-Cost Flash​​

Support for Complex Data Types​​

Integrated File System

Data Importer

Sold as software, delivered and supported as an appliance.

Innovation begins with understanding

The VAST DataBase Explained

VAST DataBase - Performance & Benchmarking

The VAST Data Platform Explained

Scalable Design

Seamless DataBase Integration

Scalable ACID Transactions

Disaggregated Architecture

Global Data Reduction

Massive Performance & Scale

Hassle-Free Table Management

Scalable Design

Seamless DataBase Integration

Scalable ACID Transactions

Disaggregated Architecture

Global Data Reduction

Massive Performance & Scale

Hassle-Free Table Management

Secure Operations

Disaster Recovery

Audit and Access

Global Snapshots

Secure Operations

Disaster Recovery

Audit and Access

Global Snapshots

Management Efficiency

Columnar Queries

Optimized for Low-Cost Flash

Support for Complex Data Types

Management Efficiency

Columnar Queries

Optimized for Low-Cost Flash

Support for Complex Data Types