The Fort Knox of Deep Learning

Authored by

Jeff Denworth, VAST Data Co-Founder

This blog post was written in 2023 and reflects product capabilities at that time. Some information may be outdated.

Today, we proudly announce that CoreWeave has selected VAST and the VAST Data Platform as the data foundation for their AI cloud. I can’t tell you how excited we’ve been to take even a little bit of the covers off this story, but what I am confident in is that very few people understand the significance of this news. The deep learning gold rush is on, and CoreWeave is at the epicenter of the action. Allow me to break it down, asking questions as if I were a busy New Yorker 🙂

OK, who is this CoreWeave?

On paper, CoreWeave is “a specialized cloud provider, delivering a massive scale of GPUs on top of the industry’s fastest and most flexible infrastructure”. This is a company that fits the modern archetype of today’s next-generation GPU/AI clouds.

In reality, CoreWeave is a leviathan of AI infrastructure that is building fantastically-large systems that are used for the development and deployment of many of the world’s most powerful and popular deep learning applications. They are the infrastructure provider for technology companies building next-generation large language models, and they are the nimble infrastructure partner to several Fortune 100 organizations who need the same mix of scale, speed and agility that these new AI startups do.

CoreWeave, like VAST, is focused on a future where deep learning will improve humanity by accelerating the pace of discovery.

Hrm. Wait. What do you mean by “fantastically-large”?

The race to AI gold compels the key players to operate with a fair bit of discretion, so we can’t disclose too much here. Having said that, there have been dribbles of information let out into the public domain. Let’s use Inflection AI as an indicative case study.

In June of 2023, Inflection announced that they had chosen CoreWeave as their cloud to deploy NVIDIA H100 Tensor Core GPUs. Let’s just put this one customer into perspective:

The infrastructure is the computational foundation of a $4B startup founded by Reid Hoffman (co-founder of LinkedIn) and Mustafa Suleyman, the former head of Deep Mind
At 6.8 Exaflops of peak FP32 single-precision speed, the Inflection system will have almost 2x more AI performance than ORNL’s Frontier system, which is the world’s fastest computer according to Top500.org (37,888 AMD MI250X GPUs at 95.7 FP32 Teraflops each)

And this is just one of many massive customer deployments.

CoreWeave has already announced their plans to build out a 14 data center cloud. The investment in their most recent Texas facility was valued at $1.6B.

And this is just one of their 14 massive data centers being built.

OK, I get it. They’re big. So what’s all this business about Fort Knox?

As of August 2023, Fort Knox is estimated to hold $6B of gold (Source: US Treasury). But gold is so much more than some shiny bars, it’s also the basis for many currencies.

Generative AI is expected to create $2-4 trillion dollars of market value to the global economy, per McKinsey. Much, if not all, of this value will be powered by accelerated computing. If LLMs are the new currency that intelligent businesses trade in, GPUs are the bullion that stands them up… therefore, CoreWeave stands as a modern analog to Fort Knox.

Hurry up, I got a thing to get to… Why did these CoreWeave people choose VAST?

VAST is “Cloud-Scale and AI-Scale.”

These were the words of Peter Salanki, VP of Engineering at CoreWeave. This is the nexus of capability we aim for when building the VAST Data Platform. When building cloud infrastructure that costs $100Ms for single customers, four things become critically important:

Efficiency matters: Every ounce of GPU performance that CoreWeave can eke out directly translates into competitiveness with leading cloud providers. VAST’s DASE architecture is fully-parallel at any scale, and delivers the optimizations needed for today’s AI supercomputers (RDMA I/O, GPUDirect Services). Even when presenting standard interfaces such as NFS, the VAST Data Platform’s inherent architecture delivers the scale needed for today’s most demanding AI clusters. VAST keeps CoreWeave's NVIDIA GPUs busy and highly utilized to get maximum value from every GPU in the systems.
Uptime matters: It doesn’t matter how fast data infrastructure is if it’s fragile and goes offline. With VAST, CoreWeave was happy to find a partner that designs for resilience at scale, and QAs its offering on over $100M of QA infrastructure. We test larger systems than even our biggest customers deploy, so we find the issues of scale before they do. With a fleet of systems that run at nearly 99.9999% availability at scale, VAST keeps CoreWeave running.
Zero-Trust matters: VAST’s Data Platform was designed with layers of security to enable zero-trust deployment of cloud infrastructure. At-rest and in-flight encryption. Audit tools. Multi-tenancy with customer-managed encryption keys. All of our enterprise features complement the scale and uptime that drives modern AI processors. VAST keeps CoreWeave customer data secure.
Team matters: The final argument for VAST was delivered as CoreWeave engaged deeply with the VAST technical team… and here they realized the level of depth and capability that could come from the principals at a startup (a startup that’s still hungry) that is uniquely operating at scale (700 people and growing, equipped to handle the needs of a demanding tier-1 cloud). CoreWeave and VAST together, as partners, are ready to redefine cloud.

Finally… Thanks to Brian, Peter and the CoreWeave team for choosing VAST. It’s an honor to realize the founding vision we had for VAST with you and your customers.

- Jeff

The Fort Knox of Deep Learning

OK, who is this CoreWeave?

Hrm. Wait. What do you mean by “fantastically-large”?

OK, I get it. They’re big. So what’s all this business about Fort Knox?

Hurry up, I got a thing to get to… Why did these CoreWeave people choose VAST?

More from this topic