The fields of genetics, structural biology, and life sciences have long been challenged by ever-growing data and processing demands. The proliferation of a new generation of GPU cards in support of cryo-electron microscopy (Cryo-EM), protein folding experiments, and genomic analysis, further places unique demands on the underlying infrastructure including storage. Most storage solutions (new and legacy) not only fail to affordably provide adequate performance across ALL your datasets, but also fail to accommodate for the high degree of variability between data sizes and access patterns. This makes affordable flash storage all the more critical.
VAST breaks these trade-offs with its unique Disaggregated and Shared-Everything (DASE) architecture, delivering extremely affordable flash, unprecedented price-performance, and exabyte scale. For your GPU-accelerated infrastructure, VAST Data delivers high-speed file access to shared storage, allowing you to power your most demanding bioinformatics applications including, BLAST (Basic Local Alignment Search Tool), GATK (Genomics Analysis Tool Kit), cryoSPARC, Relion, and even AlphaFold.
VAST investigates running GATK WGS (Whole Genome Sequencing) workflows on GPU based systems using the GATK based CLARA Parabricks solution from NVIDIA.
Accelerate data collection from any source making it available fast for applications and users at scale
Scale to TB/s of throughput and millions of IOPS to power your most demanding GPU accelerated research grids
Ideal for bio and AI pipelines that otherwise suffer from random and metadata intensive IO
Engineered at every level to make flash affordable for HPC and AI applications
Simple, multi-protocol storage for any scale-out application
Breaking Decades-old Tradeoffs
Life science teams have long struggled to balance I/O performance with the volume of data generated by bioinformatics pipelines. To address this, data is often tiered across a complex, pyramidal hierarchy of storage systems, each designed to provide either fast I/O or large capacity. While this pyramid of storage partially solves some organization’s storage problems by relegating cold data to slow, archival storage; scientists continue to evolve the questions they ask of their data. Since it is impossible to process and analyze data that’s been exiled to slow, archival storage – the opportunity for rapid scientific discovery on vast reserves of data is lost. At the same time, larger pools of data create new opportunities to find new correlations – but as of now, it has not been economically practical to store the entire life science research corpus on one fast tier of Flash.
A New Type of Storage Architecture
VAST Data breaks the decades-old tradeoff between storage performance and storage capacity with a new storage system architecture that enables unlimited processing on exabyte-scale, affordable Flash. With Universal Storage, pipelines run faster, administration is easier, and the data center impact is smaller.
With all-flash performance and fully-distributed metadata performance, Universal Storage reduces pipeline wall clock time.
Applications, users, and instruments can access the same data via SMB, S3 and NFS simultaneously – thereby eliminating specialized silos of storage.
Easily accelerate HPC & AI applications without complex PFS SW. NFSoRDMA enables clients to achieve up to 400% more performance than TCP-based file systems.
VAST’s Universal Storage scalability makes it possible to store, manage, process & archive all data in one scalable place.
VAST server pooling capability provides dedicated QoS for competing user applications.
One, simple-to-manage scale-out file system appliance, remotely monitored by VAST.
From accelerating gene sequencing pipelines to enabling the software frameworks for genetic research, VAST Data’s Universal Storage provides low latency and high throughput for organizations that need to process petabytes and exabytes of data to drive new insights.
For Cryo-EM frameworks such as Relion and CryoSPARC, VAST data delivers the bandwidth and IOPS needed to accelerate GPUs computing. Researchers can now quickly collect and process data from multiple pipelines delivering more accurate results rapidly.
Our mission to make biology easier to engineer is enabled by VAST Data making storage easy. Our output is exponentially increasing along with decreasing unit costs, so we are always looking for new technologies that enable us to increase output and reduce cost. VAST Data provides Ginkgo the potential to ride the declining cost curve of flash while also providing near-infinite scale.
As an early adopter of advanced storage systems, we’ve deployed scalable storage architectures to help HHS agencies to pioneer new scientific discoveries and improve public health. As our component Operating Divisions move beyond the hard drive era, software-enabled storage architectures helps us modernize our scientific agenda and enable AI-driven research with the power of flash.
VAST delivered an all-flash solution at a cost that not only allowed us to upgrade to all-flash and eliminate our storage tiers, but also saved us enough to pay for more GPUs to accelerate our research. This combination has enabled us to explore new deep-learning techniques that have unlocked invaluable insights in image reconstruction, image analysis, and image parcellation both today and for years to come.
To achieve our mission, our GPU infrastructure needs high-speed accelerated file access to shared storage that is faster than what traditional scale-out file systems can deliver. That said - we’re also a fast-growing company and we don’t have the resources to become HPC storage technicians. VAST provides Zebra a solution to all of our A.I. storage challenges by delivering performance superior to what is possible with traditional NAS while also providing a simple, scalable appliance that requires no effort to deploy and manage.
6 Mins Read
5 Mins Read
6 Mins Read
9 Mins Read
3 Mins Read
7 Mins Read
6 Mins Read
8 Mins Read
10 Mins Read
8 Mins Read
8 Mins Read