Data Platforms Need To Evolve​

For decades, datastores have been unaware of applications, and applications have been equally unaware of data events. The division between applications and data has resulted in fractional solutions to building data pipelines and a batch processing mentality which separates data streams from deep data analysis.

The VAST Data Platform aims to break the tradeoff between data streaming and global insight by engineering data processing and event notifications natively into the system.

By supporting new types of data - functions and triggers – the VAST Data Platform makes data dynamic by adding support for procedural functions in the same way that JavaScript made websites dynamically interactive.

With the VAST DataEngine – data, and changes to data, trigger action, action is then performed on the data, and the system processes recursively forever. The Data Engine is the basis for perpetual AI training and inference and we hope will be the basis for the AI-powered discoveries of the future.

A Programmable Computing Engine in Software

The DataEngine is a containerized computing environment that customers deploy on their choice of CPUs, GPUs and DPUs – from edge to cloud. By embedding logic directly into the VAST Data Platform, the system can schedule processing events in real time, triggered by data activities.

images

DataEngine Programmable Environment

VAST’s DataEngine provides a programmable environment in Python for developers to bring their own code. There are also a number of built-in functions that are provided out of the gate to get value from the VAST Data Platform.

These include:

  • Data Indexing

  • File header indexing

  • PII data detection

  • Ransomware detection

  • Streaming between tables/topics/files

  • Data augmentation

images
A New Data-Aware Recursive Computing Engine

Next-Generation Event Streaming Infrastructure

The VAST DataEngine features a new data streaming interface designed to write events natively into the VAST DataBase.

For the first time, it’s now possible to analyze all data by ingesting streaming data in real-time into VAST’s exabyte-scale transactional and analytical database.

images

A Real-Time Event Router

The VAST Event Router unifies unstructured and structured data event management into a common platform, providing event consumers simple tools to trigger action.

images

The VAST Data Platform is designed to create structure and insight from unstructured data.

By storing triggers and functions as state in the VAST Data Platform, your code becomes dynamically managed by a global data store that supports global code versioningglobal code distribution and global code security policies.

images

A Simple Python SDK

The VAST DataEngine is a serverless platform, programmed in Python, that integrates stateful functions into an exabyte-scale datastore.

By integrating streaming and data processing with an exabyte scale datastore and database, the Data Platform enables comprehensive function calling with minimal code.

images
A New AI Dataset

Introducing the VAST DataSet

Deep learning data engineering is tough. Data engineers write large dataset files down to archive storage for training… creating a number of problems associated with rigid data management:

  • If model training requires data variation, new datasets are written down to storage, often creating redundant data because datasets use overlapping training example data

  • Because conventional datasets are not embedded with training code, it can often be difficult to reproduce training models as data and code continue to evolve independently

With the DataEngine, VAST is introducing a new concept called the VAST DataSet. This new approach to data management leverages the VAST Database to create materialized views of example data without copying and re-copying data into blunt data containers. DataSets can scale to exabytes. Each DataSet includes an indexed set of examples and the code used for training so that it’s easy to reproduce models on the fly.

images
Global Compute Orchestration

A Global Execution Environment

The VAST DataEngine is built on a container framework that allows for services to be globally executed across the VAST DataSpace.

images
Features

Real-time Insights, Continuous AI Training and Smarter Global Workflows

Optimize Data Operations​

Transform operations by automating data-driven workflows across global environments. With the VAST Data Engine, data operations are seamlessly managed from ingestion to action.

Event Triggers​

The VAST DataEngine can utilize event triggers, enabling the system to act on data in pre-defined ways.​

A Collection of Built-in Functions​

Perform functions created or provided by VAST customers and orchestrated by the VAST DataEngine to deliver additional data value.​​

Kafka-Compatible Broker​

Accept Kafka APIs, storing each topic as a table in the VAST DataStore and each message as a row/record in that table.​​

A Global Execution Engine​

Learn where data is located and optimize performance by moving function execution closer to previously accessed data.​​

A Simple Python SDK​

The VAST DataEngine is a serverless platform, programmed in Python, that integrates stateful functions into an exabyte-scale datastore.​

A New Approach to Data Management​

The VAST DataSet leverages the VAST DataBase to create materialized views of example data without copying and re-copying data into blunt data containers.​​

Optimize Data Operations​

Transform operations by automating data-driven workflows across global environments. With the VAST Data Engine, data operations are seamlessly managed from ingestion to action.

Event Triggers​

The VAST DataEngine can utilize event triggers, enabling the system to act on data in pre-defined ways.​

A Collection of Built-in Functions​

Perform functions created or provided by VAST customers and orchestrated by the VAST DataEngine to deliver additional data value.​​

Kafka-Compatible Broker​

Accept Kafka APIs, storing each topic as a table in the VAST DataStore and each message as a row/record in that table.​​

A Global Execution Engine​

Learn where data is located and optimize performance by moving function execution closer to previously accessed data.​​

A Simple Python SDK​

The VAST DataEngine is a serverless platform, programmed in Python, that integrates stateful functions into an exabyte-scale datastore.​

A New Approach to Data Management​

The VAST DataSet leverages the VAST DataBase to create materialized views of example data without copying and re-copying data into blunt data containers.​​