In a recent webinar VAST VP of Systems Engineering Subramanian Kartik and I tackled the implications of a shift in AI infrastructure from legacy parallel file systems to modern all-flash platforms like VAST. This blog summarizes the key points we discussed.
HPC High-performance computing (HPC) harnesses the power of computer clusters to solve complex problems with massive data sets. Workloads have traditionally run on parallel file systems due to their large block and sequential IO patterns. AI workloads, however, require a different type of system and are in contrast mainly based on read IO patterns, and especially by random read.
To accommodate this evolution, the industry is transitioning to all-flash based systems powered by SSDs. All-flash systems are better equipped to handle both HPC and AI workloads. In addition, SSDs provide easier accessibility to the entire namespace, which would be hard to achieve using HDDs.
The following are insights from the discussion around questions that have arisen about this recent change in workloads, storage, and infrastructure.
For Parallel File Systems, High Performance Equals High Complexity
While parallel file systems have been the go-to for HPC workloads, they come with challenges. Parallel file systems are difficult to maintain and require additional client side considerations. Upgrade tasks become complicated due to their complexity.
VAST Data’s technology offers high performance without the complexity or special network infrastructure. Throughput of VAST clusters can deliver terabytes per second in read throughput for thousands of clients, plus individual amounts of 170 GB/second for single mount points from a single client.
NFS and S3 protocols have comparable performance levels with VAST and S3 is becoming more popular due to its use in Hadoop, deep learning, Apache Spark, Cassandra, and Spark ecosystems.
Don’t AI workloads require a lot of write performance that VAST can’t deliver?
For those wondering if VAST can deliver sufficient write performance for AI, it’s important to note that 95 percent of AI workloads are read-intensive when dealing with customers. This runs contrary to the perceived need for an even balance between reads and writes. Exceptions exist (e.g., HPC or Large Language Model checkpointing) but AI workloads are largely dominated by reads. The right amount of read/write bandwidth must be met to successfully complete tasks.
Additionally, flash technology is becoming increasingly affordable and surpassing the capabilities of hard drives, while having a lower environmental impact and higher memory density.
Parallel File Systems are Not Built for Non-Disruptive Ops
File systems running in parallel or in clusters are fragile in terms of non-disruptive operations. But VAST boasts of an architecture that guarantees 100% uptime, along with freedom from maintenance operations interruptions. How? Simple! VAST’s single class storage works for clusters of all sizes without the need for any manual data layout or tweaking knobs. VAST also exposes its namespace through industry-standard protocols, not requiring native proprietary clients.
Isilon/PowerScale customers have already seen the benefits of the product, as VAST saves session state for stateful protocols in persistent NVMe. NVMe is a new protocol designed specifically to provide systems access to non-volatile memory devices like SSDs. NVMe has much lower overhead, and allows much more parallel I/O than the older SCSI protocol, thus ensuring even SMB (Server Message Block) is the file protocol developed by Microsoft. The now obsolete and deprecated version 1.0 was known as CIFS. Primary file protocol for Windows and OS X. 2 sessions will not suffer a disconnect in a rolling upgrade. Sounds impressive, right?
VAST’s Docker containers eliminate the complexity of technology and allow for fast upgrades and restarts, without worrying about meta-data servers or carving stuff out. Additionally, setting up file shares or exporting data is effortless.
Once introduced to your stack, VAST could assure near-to-zero trouble tickets and seamless functioning of your system with minimal effort.
Proprietary File System Clients are a Necessary Evil
Proprietary file system clients can be seen as a necessary evil to achieve optimal performance. However, these native clients come with their own set of limitations, like being compatible with only one storage platform and needing upgrades when the underlying system is changed.
Luckily, advanced methods have been developed that use industry-standard clients. Users can now switch platforms seamlessly without making any changes on the client side. VAST engineers have practical experience with this particular issue and have seen some unique scenarios where the client can become part of the file system and interact in peculiar ways.
For instance, Alphafold, a DeepMind program that solves the complex mathematical problem of protein-folding, makes heavy use of mmap() files. Upon running on VAST system, Alphafold sees an improvement of 500-700 percent, compared to a parallel file system. VAST has significant experience with this, so let us help you navigate this complex world of file systems.
Science Project Deployments
To develop and deploy heavy-use parallel file systems, an exceptional team with a deep understanding of code and operating systems is essential. This is particularly true with parallel file systems, which are more common in National Labs due to their access to top-tier expertise.
Unfortunately, many commercial enterprises lack these resources, creating a critical need for sustainable solutions that enable businesses to compete. Even higher education institutions benefit from using parallel file systems, thanks to the availability of graduate students and postdocs who can assist with operations.
But deploying these solutions requires far more than technical know-how. It also demands a thorough understanding of the strain they place on the file system to ensure they are effectively tuned, managed, and maintained.
There is a Better Way
As an IT professional, you might wonder if a startup can handle AI workloads, use new technology, and remain sustainable for years. Let me introduce you to VAST – a rapidly growing start-up making waves in the industry.
VAST’s team consists of passionate, clever, and daring professionals who are actively building personal and engaging relationships with customers. We work hard to create intimate customer journeys and relationships, and we look forward to discussing how we can help you with your advanced data and analytics needs.