GPU-Accelerated Everything

Moore’s Law has ended for CPUs, leaving GPUs as the logical architecture to accelerate performance for the types of applications that make the world go round. Some of those, like LLMs and agents, are AI-native; others, like online recommendation systems, are replacing traditional algorithms with AI models; and still others, like good, old-fashioned SQL analytics, have essentially nothing to do with AI.

What they all have in common is a heavy reliance on data-processing, which is a computationally intensive workload by nature. Whether it’s a simple SQL query, a similarity search across products in a database, or an LLM prompt, data-based interactions put the onus on the underlying systems to retrieve the correct data and execute the command. The bigger and more complex the data set (or algorithm or model), the more computing power it takes to process the job in a reasonable timeframe.

This isn’t exactly some great revelation, though. The importance of GPU-style parallel processing for AI workloads has been common knowledge since 2012, well before the current AI boom took off. And since NVIDIA CUDA was released in 2007, researchers and entrepreneurs have been experimenting with GPU acceleration for SQL databases.

Today, however, the stars have aligned in terms of technological advancements, the scale and complexity of datasets (both structured and unstructured), and a new class of applications that demand better performance. And this is why VAST is excited to announce the CNode-X — a next-gen VAST AI OS server equipped with local GPU acceleration.

A software partnership in hardware form

The CNode-X is a continuation of our deepening collaboration with NVIDIA and, although the form factor is hardware (a server), this is a software development at its core. As NVIDIA expands beyond core GPU use cases like graphics processing and, of course, AI computation, we find more opportunities to improve application performance for our joint customers. With CNode-X instances, we’ve baked key NVIDIA libraries — starting with those for vector search, tabular analytics, and the management of NVIDIA NIM microservices (containerized instances of AI models) — into the VAST software stack.

While RAG pipelines on VAST were always fast, they’re now even faster because every component runs locally. This includes the embedding model, LLM, and vector database, as well as the orchestration layer and serverless functions to execute the pipeline. Of course, even when used in a non-LLM environment, vector databases can still benefit from GPU acceleration when executing more traditional jobs around similarity search or clustering.

We’ve also added analytic query acceleration via the Sirius SQL engine, which was initially designed as a drop-in GPU-acceleration engine for DuckDB, to further increase query performance on tabular data.

An introduction to VAST CNodes

If you’re reading this and are new to VAST, a little background might be helpful to contextualize why we’re so excited to announce our CNode-X servers.

In a traditional VAST deployment, all compute jobs execute their data access on what we call CNodes — a term for both the physical x86 boxes and the stateless instances running on them — that handle storage protocols, query processing, resource orchestration, and more. CNodes connect via NVMe to the VAST data layer (DNodes) and, as a result of our unique shared-everything architecture (called DASE), CNodes have direct, parallel access to data without relying on sharding, east-west traffic, or resource coordination. Any CNode can access any data, thus boosting performance by bringing compute to the data.

In addition to operational simplicity, this architecture can result in serious performance. For instance:

DUG Technology, an HPC service provider, used its VAST deployment to help a scientific research client process several years worth of backlogged data in just a few hours — at 125 times the performance they were able to achieve on public cloud infrastructure.
Pixar used VAST as the data platform for 160,000 CPU render cores to produce Elementals — on up to 2PB of data — and has since begun using its VAST cluster as the data foundation of its AI initiative.
VAST’s native event broker utilizes the DASE architecture to outperform Apache Kafka by more than 600%, and an optimized commercial distribution of Kafka by more than 150%, in terms of event throughput.

In the common scenario where VAST connects to GPU resources for AI or HPC workloads that require GPU computation — including some very large frontier-model training runs — the GPU cluster and CNode cluster connect via NVIDIA GPUDirect Storage to maximize performance across the network.

As a tangible example of an increasingly popular use case, consider a VAST user running a real-time RAG pipeline within a Kubernetes environment, and taking advantage of the VAST DataEngine and DataBase services. The user can define a set of serverless functions that kick off whenever triggered by a specific event, like a new file hitting the object store. CNodes manage this pipeline in real time by:

Launching a NVIDIA Nemotron Embed instance, packaged as an NVIDIA NIM, that runs in a GPU-backed container (on separate hardware),
Receiving the embeddings.
Inserting those embeddings into the VAST vector database.

When a user — human or AI agent — interfaces with the vector database via an LLM, that model will run on another GPU-backed container and communicate with CNodes to augment the model’s output via RAG.

From high-speed data processing to GPU-speed data processing

With CNode-X servers, we’re colocating CPU-powered CNodes and GPU resources within the same box. In the RAG pipeline example above, that means everything — storage, database, event broker, functions, and AI models — all run natively on VAST infrastructure.

In the case of GPU-accelerated SQL analytics, Sirius has been benchmarked at about 10x faster than vanilla DuckDB at the same hardware cost. Compared against older, slower CPU-based SQL engines like ClickHouse, the difference is much more stark. What’s more, VAST AI OS is already a complete data platform, combining high-scale, high-performance storage with resource orchestration and a suite of data services that can replace and outperform complex distributed data architectures. Our work with NVIDIA on the CNode-X servers turbocharges the VAST platform, pairing the next-generation of critical applications and workflows with unparalleled computation speed.

When available this spring, CNode-X servers will be available through hardware partners, including Cisco and Supermicro. They can run AI-optimized Intel or AMD CPUs and two or more NVIDIA RTX PRO™ 6000 Blackwell Server Edition GPUs, which, in a cluster setup, can easily host and serve any publicly available AI models.

GPU-Accelerated Everything

A software partnership in hardware form

An introduction to VAST CNodes

From high-speed data processing to GPU-speed data processing

More from this topic