Accelerating Genomic Discovery with VAST Data

Genomics leader PacBio is a pioneer in long-read DNA sequencing, empowering scientists to better understand genomes for research ranging from healthcare to agriculture.

Read Success Story

Industry

Healthcare & Life Sciences

Use Case

HPC

Manufacturing

Overview

Genomics leader PacBio is a pioneer in long-read DNA sequencing, empowering scientists to better understand genomes for research ranging from healthcare to agriculture. However, generating extreme amounts of data from developing and manufacturing PacBio’s highly accurate sequencers like the Revio and Sequel IIe Systems presents massive data challenges.

Adam Knight, Director of IT Infrastructure at PacBio, explained, “For us, it’s the time it takes to sequence genomes, the cost of sequencing, and the error rates and accuracy that are important. We’re trying to increase accuracy, speed, and throughput, while reducing error rates and overall cost of sequencing runs.” Knight added, “We had to do something different. It had to be more than just storage; it had to be a seamless fit into our existing operations.” Knight continues, “Our sequencers produce gigabytes of data per sequencing run for our customers, so you can imagine how much data we need to generate in order to design, develop, and manufacture an instrument platform like Revio. We also have a geographically distributed team working around the world, so we were facing delays due to the logistics of getting data to the right people at the right time.”

With the totality of datasets growing into the petabyte range, performance and scalability were top concerns. Like many growing companies, PacBio’s infrastructure sprawled across locations and legacy infrastructure. Knight noted, “We faced challenges with diverse file access protocols and bottlenecks when moving data between our instruments and our network file shares.” Consolidating on a high-performance and easily scalable data platform became a priority.

Video

PacBio + VAST: Accelerating Genomic Discovery

PacBio leads in long-read DNA sequencing for healthcare and agriculture research. Their accurate sequencers generate massive data, addressed through VAST Data Platform for solutions.

Background

After evaluating options, PacBio selected VAST Data’s DataStore which addressed PacBio’s requirements of extreme performance, limitless scalability, and multiprotocol support. The VAST Data Platform is composed of several components including a DataStore, DataBase, DataSpace and DataEngine.

As Knight explained, “VAST initially caught our eye as a more affordable high-performance tier for HPC data management. It was positioned as highly scalable with a solid roadmap and good support. It beat out our existing solution.” Beyond performance for HPC workloads, VAST Data offered ease of management and the ability to consolidate file and object data under a global namespace.

Outcome

With VAST Data, PacBio can keep pace with its sequencing innovations and increasing data demands. Knight explained, “It’s a robust, high-performance platform that’s easily scalable.”

He elaborated, “We recently added another 2PB to our cluster with no issues. VAST has been reliable as we continue to expand it. And we’re excited to see how easy it is to scale as we grow our data platform infrastructure.”

The software-defined architecture simplifies adding performance and capacity. Knight noted, “As we grow our cluster, VAST has terrific dashboards and tools showing bottlenecks and data flows.”

The company also benefits from the robust roadmap of the VAST Data platform, like the new data catalog. “The ability to manage metadata on-box, with no external queries needed, will be massively useful” remarked Knight. “It will immediately accelerate workflows that currently require heavy offline processing.”

He concluded, “Our R&D teams love the increased speed achieved through S3 transfers and so we expanded its use across our internal network.” By delivering breakthrough performance today, while rapidly enhancing intelligence, automation, and multi-cloud capabilities, VAST Data is powering PacBio’s innovation engine now and into the future.