I have been with VAST Data for three years. 20 years before that I was at EMC and Dell. Both are great companies. But I joined VAST to solve the difficult problems of humanity.
When I studied biology in high school, I did not expect medical science would have advanced to this level where we can solve problems that were previously thought unsolvable. My Ph.D. is in particle physics. I did many years of research before I joined IT. It has been a rewarding career. I get to solve difficult problems. I work with brilliant people, including clients, that I learn from every day.
Life and Data
Understanding life is a function of understanding data associated with all aspects of life. And life is complex.
The advancements we are seeing in medical science come with their own challenges – specifically, how to process different kinds of data from different sources. The data is different from what we are used to in IT. We came from a transactional dominated era, where relational databases and transaction processing were the most important things.
The new data is semi-structured and unstructured. It is growing much faster and creating challenges that come from the growth we see in storage. These are common problems in all fields of technology.
One of the greatest revolutions we have seen in medicine is in the field of genomics. In the 1990s, the world started on an ambitious project to sequence the entire genome. That effort took 10 years. It cost a billion dollars with thousands of brilliant scientists working on this problem. In the 20 years that followed, we’ve been able to sequence a genome in a matter of hours at a cost of less than five hundred dollars. That cost and time required will continue to drop.
Why Is This Important?
One of my close friends ran storage infrastructure at a Fortune 50 healthcare company. About five years ago his wife was diagnosed with leukemia. She was fine one day and then a week later she did not have the strength to walk across the room.
They took her to the hospital where she was diagnosed with a very aggressive form of leukemia. They said, “you have two months to live. That’s all the time you have because while the leukemia is curable we don’t know how to cure it. The reason we don’t know how to cure it is there are 50 variants of leukemia that could cause these symptoms and we do not know which one it is. If we could identify which one it is, we could come up with the proper medicines to treat the particular variant.”
They sequenced her genome. They were able to identify the gene and the specific markers that were causing the disease by looking at the structure of her genes. From that they identified exactly which of the 50 different variants it was. They were able to give her the right therapeutics, the right chemo, and things to help her survive.
I’m happy to report that five years later she’s alive and well.
Medical Science Today
Data and the ability to analyze it dominates medical science. Genome sequencing requires 30X more data than exome sequencing. But it’s not just standard sequencing these days – the advent of so-called “long-read” sequencers like from Oxford Nanopore or PacBio are promising to make sequence alignment much more efficient as well. Testing of these workloads on VAST, which typically need GPUs and Deep Learning techniques to do “base-calling,” show that these run as well as on local NVMe drives.
Another area of extremely high interest is the field of Structural Biology. Here the objective is to understand the structure of proteins – this jumped into prominence with the COVID-19 pandemic as viral structures became very important. This kind of work is critical for drug discovery as well to allow drug molecules to bind appropriately with the relevant proteins. Cryo electron microscopy (Cryo EM) is the pre-eminent experimental technique used for this, and even this is giving way to cryo electron tomography (Cryo ET) with 10X larger data payloads.
Protein folding is a very difficult problem to solve, hence the need for high-precision CryoEM experiments to determine this. A few years ago, Deep Mind released a revolutionary protein-folding model known as AlphaFold, which is unprecedented in its accuracy in predicting how a sequence of amino acids folds in a protein. It has soon become the darling of the Structural Biology community globally and one of the most run pieces of software these days. AlphaFold runs exceptionally well on VAST, making this a very popular platform for it. This creates new opportunities that make new demands on infrastructure.
The needs of data scientists have changed. According to Chris Dagdigian, Co-founder and Senior Director of The BioTeam, “The concept of having an archive tier or a near-line tier or a slow tier doesn’t make a lot of sense. I no longer can get away with tiers of different speed and capacity. If I need to satisfy the ML and AI people, you pretty much need one single tier of performance storage.”
VAST Data Platform
The VAST Data Platform eliminates the need for tiering. It solves the data and storage challenges for medical science companies.
The solution is fourfold:
A revolutionary systems architecture with a shared everything cluster with no limits on scale.
Enterprise NAS with HPC/AI performance without the need for complex HPC systems and none of the scaling limits of legacy NAS.
A simple system that people can stand up and use and grow and scale as needed.
A simple financial model where the software licenses are not tied to hardware. All equipment has a 10-year warranty.
The resilient architecture uses QLC drives and reduces cost by up to 80% compared to legacy flash technologies.
Our strength is our ability to build big affordable all-flash clusters with no scale limits and no performance barriers. All with the ease of use one would expect of a NAS, with all the features and functions at affordable cost.
Our higher level mission is to make a positive impact in changing humanity to improve how we live and how a planet flourishes. We are dedicated to helping scientists understand and decode brain function and cure big cancer and disease. We are currently working with dozens of life sciences companies and we would love to work with you to help solve life’s hardest problems.