“Everything should be made as simple as possible, but not simpler.”
One of my favorite parts of my work at VAST is that I get to spend a lot of time with many of our current and future customers. Having nearly 20 years of experience in the high-performance storage industry, it’s also natural for the sales team to call me in when the customer wants to turn things up to “11”. Machine Learning and Deep Learning (ML/DL) is arguably the hottest part of the performance storage market in terms of driving new use cases as well as new I/O requirements for storage… and where there is no incumbent, there are always questions about how to build a storage system that can evolve with the fast pace of data science innovation. Fortunately, the answer to this option can also be the simplest path to building, scaling and affording AI storage.
A Tale Of Two Compromised Approaches
Until now, customers have had 2 choices that are in some ways diametrically opposed when it comes to building AI storage infrastructure… neither of which are ultimately simple:
Now, there are a lot of considerations to unpack with the above… I’ll try to be brief. Consider that in order to keep expensive GPUs and AI processors fed – AI applications demand speed, but this is not always consumed in the way that storage marketing organizations would want you to believe.
|AI Requires Flash||
At least this topic is no longer disputed. Basically every AI computer ships with local SSDs for swap space. All AI reference architectures assume you’re processing in flash.
We’ve seen that I/O request sizes can vary from 4KB to 1MB. When customers use classic HPC storage testing tools on AI storage, they are later surprised to find that while their file sizes are large (eg. Tensorflow’s TFRecord), AI tools will look through these files at 100KB increments. This creates hell for the hard drive..
|AI Requires All-Flash||
To minimize the cost of their comparatively expensive flash solutions, many vendors will advise customers to consider a tiered approach to storage where the problem of managing data ultimately gets laid before a customer in the form of decision making complication. Tiered storage is great for write-intensive workloads, but cannot be effectively applied for AI when GPUs will randomly read through new and old parts of the namespace in order to train and retrain neural networks.
We’ve seen customer performance go down by 98% when experiencing a cache miss on a legacy tiered architecture, which leads us to conclude that you can’t prefetch a random read. There’s no acceleration you’ll get from a tiering algorithm when all of your accesses are happening at random 100KB increments.
|AI Requires Fast Client Performance||
Savvy customers will know that many of the NFS prescriptions from the legacy NAS vendors will advise that GPU machines use multiple NFS mountpoints to overcome the classic 2GB/s TCP and single-stream limits of the NFS client. Multiple mountpoints again put the problem of dealing with data on the customer, where customers now need to reconcile which mountpoint is being used to get which data.
Here, parallel file systems have always provided support for multiple streams and have RDMA options for customers who really want to open up all of the high-bandwidth connections to their client. Parallel file system clients, on the other hand, bring their own challenges with respect to installing and maintaining a proprietary file system client driver directly onto a customer’s compute node. With this, a customer’s client OS strategy now becomes intertwined with a client’s file system version strategy, creating a series of dependencies that complicate day to day site operations.
Nvidia recently raised the stakes of high-bandwidth I/O into GPU clients by announcing the upcoming availability of their GPUDirectⓇ Storage (GDS) capability. GPUDirect enables customers running Nvidia GPUs to accelerate access to data by avoiding the creation of extra data copies between storage and the GPU and by avoiding the CPU and CPU memory altogether. Without RDMA support, however, a GPUDirect I/O is not possible… so legacy TCP-based NAS will not be able to provide this type of CPU and I/O offload.
|AI Requires Scale||
I have to admit that I didn’t entirely appreciate this point until I saw the launch of the DGX-A100. With 8 x 200Gb ports that can be optionally used for I/O… a single AI computer can consume more performance than what a legacy all-flash NAS can deliver at maximum scale. Scalable environments working on computer vision and NLP models will separate the weak from the strong over the next few quarters, and we’re now starting to see that systems with a few storage controllers can not bring us to full autonomy.
Scaling will be a big part of the conversation going forward. We now have customers talking about building clusters of 1000s of GPUs (for example), and who want a single file/object storage system to feed all of these machines from a single namespace. Sizing for scale has now added another complication as organizations plan their AI strategy.
On top of it all, change has emerged to be the only true constant. Data scientists are changing and evolving their toolsets daily as they race toward building better products and services. This space is so nascent, that everyone is learning, tweaking and evolving on the fly. One of the principal learnings that we’ve observed, as we look toward how some of the more established AI teams are working… is that they cannot let the infrastructure decisions of today prohibit application optimizations of the future. To get there, scalable all-flash with rapid client throughput is the key.
Simplify All The Things
Today, we announce VAST Data LightSpeed – a storage platform for the next decade of machine intelligence. This is the first solution for AI storage that marries the light touch of a scale-out NAS that makes flash affordable for all data with the speed that has only previously been possible from complex HPC storage technologies. LightSpeed combines support for a number of new technologies and thinking that is detailed in this new e-book we created. I’ll quickly itemize the points announced, and then get to the product philosophy.
- New LightSpeed Enclosures: At 40GB/s – VAST’s LightSpeed NVMe is a new protocol designed specifically to provide systems access to non-volatile memory devices like SSDs. NVMe has much lower overhead, and allows much more parallel I/O than the older SCSI protocol. enclosures now deliver 2x the performance of VAST’s previous NVMe enclosures. The result is now better performance density and a capacity/performance ratio that marries well to the “desired” performance that is called out by leading AI processor manufacturers.
- Announced Support for GPUDirect Storage: VAST has been working diligently with the Nvidia GPUDirect team to unleash the performance of NFS for the next generation of AI computing. In July, we demonstrated leadership levels of GDS performance by driving over 88GB/s of single mountpoint performance to a DGX-2 machine. When comparing this to the 2GB/s that is achievable with standard TCP-based NFS… VAST’s support for NFSoRDMA, NFS Multipathing and GPUDirect Storage achieves over 40x the performance of legacy scale-out NAS offerings all without requiring proprietary client drivers.
- New Reference Architectures: VAST’s ground-breaking The VAST DASE (Disaggregated Shared Everything Architecture) disaggregates (separates) the storage media from the CPUs that manage that media and provide storage services. This disaggregated storage, including all the system metadata, is shared by all the VAST Servers in the cluster. DASE allows use... More architecture can be scaled to meet the needs of any AI ambition. As customers get more experience with our solution for AI, we’re now offering up reference architectures that match the performance requirements of I/O-hungry GPUs with LightSpeed cluster configuration that take much of the guesswork out of configuring AI storage. Need 160 x GPUs, might we suggest a VAST Decagon? 🙂
- New LightSpeed Customers: VAST is already delivering LightSpeed systems to customers who are pioneering new approaches to accelerating image analysis across massive troves of MRI and PET scan data. The affordability of LightSpeed made it possible to pay less for VAST’s all-flash than competing hybrid storage offerings that had been proposed to them, which made choosing VAST… simple.
Looking beyond the details, it’s important to recognize the philosophy in play. Applying ML or DL, in any context, is already a massive undertaking for any organization. If these initiatives then have to struggle with how to engineer, tier and scale a variety of systems and mountpoints to meet the needs of today’s and tomorrow’s application agenda… the number of variables in play can be overwhelming.
We call this new offering LightSpeed, not because it’s fast…
but because it’s ultimately simple without compromising on performance.
No more wrestling with storage tiers and cache misses.
All of your data on affordable flash.
No more needing 5 x PhDs to run a parallel file system.
NAS appliances can be simple.
No more wrestling with multiple mount points.
NFS is now over 40x faster to feed your GPUs.
No more scaling bottlenecks.
VAST’s DASE architecture is designed for exascale.
Let your data scientists dance with their data by removing the variables that have introduced compromise into storage environments for decades. Take AI to LightSpeed.