The fields of genetics, structural biology, and life sciences have long been challenged by ever-growing data and processing demands. The proliferation of a new generation of GPU cards in support of cryo-electron microscopy (Cryo-EM), protein folding experiments, and genomic analysis, further places unique demands on the underlying infrastructure including storage. Most storage solutions (new and legacy) not only fail to affordably provide adequate performance across ALL your datasets, but also fail to accommodate for the high degree of variability between data sizes and access patterns. This makes affordable flash storage all the more critical.
VAST breaks these trade-offs with its unique Disaggregated and Shared-Everything (DASE) architecture, delivering extremely affordable flash, unprecedented price-performance, and exabyte scale. For your GPU-accelerated infrastructure, VAST Data delivers high-speed file access to shared storage, allowing you to power your most demanding bioinformatics applications including, BLAST (Basic Local Alignment Search Tool), GATK (Genomics Analysis Tool Kit), cryoSPARC, Relion, and even AlphaFold.
VAST investigates running GATK WGS (Whole Genome Sequencing) workflows on GPU based systems using the GATK based CLARA Parabricks solution from NVIDIA.
Accelerate data collection from any source making it available fast for applications and users at scale
Scale to TB/s of throughput and millions of IOPS to power your most demanding GPU accelerated research grids
Ideal for bio and AI pipelines that otherwise suffer from random and metadata intensive IO
Engineered at every level to make flash affordable for HPC and AI applications
Simple, multi-protocol storage for any scale-out application
Breaking Decades-old Tradeoffs
Life science teams have long struggled to balance I/O performance with the volume of data generated by bioinformatics pipelines. To address this, data is often tiered across a complex, pyramidal hierarchy of storage systems, each designed to provide either fast I/O or large capacity. While this pyramid of storage partially solves some organization’s storage problems by relegating cold data to slow, archival storage; scientists continue to evolve the questions they ask of their data. Since it is impossible to process and analyze data that’s been exiled to slow, archival storage – the opportunity for rapid scientific discovery on vast reserves of data is lost. At the same time, larger pools of data create new opportunities to find new correlations – but as of now, it has not been economically practical to store the entire life science research corpus on one fast tier of Flash.
A New Type of Storage Architecture
VAST Data breaks the decades-old tradeoff between storage performance and storage capacity with a new storage system architecture that enables unlimited processing on exabyte-scale, affordable Flash. With Universal Storage, pipelines run faster, administration is easier, and the data center impact is smaller.
With all-flash performance and fully-distributed metadata performance, Universal Storage reduces pipeline wall clock time.
Applications, users, and instruments can access the same data via SMB, S3 and NFS simultaneously – thereby eliminating specialized silos of storage.
Easily accelerate HPC & AI applications without complex PFS SW. NFSoRDMA enables clients to achieve up to 400% more performance than TCP-based file systems.
VAST’s Universal Storage scalability makes it possible to store, manage, process & archive all data in one scalable place.
VAST server pooling capability provides dedicated QoS for competing user applications.
One, simple-to-manage scale-out file system appliance, remotely monitored by VAST.
From accelerating gene sequencing pipelines to enabling the software frameworks for genetic research, VAST Data’s Universal Storage provides low latency and high throughput for organizations that need to process petabytes and exabytes of data to drive new insights.
For Cryo-EM frameworks such as Relion and CryoSPARC, VAST data delivers the bandwidth and IOPS needed to accelerate GPUs computing. Researchers can now quickly collect and process data from multiple pipelines delivering more accurate results rapidly.
31:02
14:59
27:38
27:56
02:08
6 Mins Read
5 Mins Read
6 Mins Read
9 Mins Read
3 Mins Read
7 Mins Read
6 Mins Read
8 Mins Read
10 Mins Read
8 Mins Read
8 Mins Read