perspectives
Apr 4, 2024

The Intersection of Enterprise Data Management and Supercomputing

images

Posted by

Jeff Denworth, VAST Data Co-Founder

I’ve wanted to write this blog for a long time, because I think Brown University really clued into something special about VAST’s architecture and then took a progressive position to advance their research agenda in ways that their peers are only now starting to catch up to. Here goes…

Secure Research Data Enclaves. Rolls right off the tongue, right?

For decades, academic computing organizations have been forced to maintain two styles of data management and storage systems for different styles of campus computing.

  • On one side of the school, open high-performance computing happens on big bad HPC clusters that hammer away at HPC file systems. Here, customers take the option to spend more on performance than on the data management bells and whistles so that their clusters run as fast as possible. These users also assume a significant amount of storage integration, administration and linux engineering work - working under the assumption that post-docs are relatively affordable and better to pay a salary than pay for some fancy storage array.

  • On the other side of the school, there are other computing programs that require more serious data management rigor. These initiatives deploy data lakes and data stores to serve regulated data sets (such as population studies, clinical data, etc.) that necessitate the need for end-to-end encryption, secure data erasure, data access auditing and other data governance features which have not been the domain of high-performance HPC storage systems. Here, organizations step their performance expectations down in favor of buying enterprise arrays that secure and protect data. Because these systems are designed with enterprise rigor, there’s also a lower burden of administration on them.

Over the last 12 months, we’ve witnessed a rise in the need for data isolation and multi-tenancy in these environments. The world’s leading academic institutions are driving a multidisciplinary agenda that seeks a variety of research grants that each need secure data enclaves. At VAST, we’ve been building for this event for almost a decade now… and the tools we deploy into large cloud service providers (zero-trust, multi-tenancy, auditability) has now intersected with a simple-to-manage, extreme-scale data platform that allows academic institutions to serve all of these secure data sets with one unified platform.

One of my favorite examples of this synthesis of enterprise and HPC computing is at Brown. Brown University is an Ivy-League school and the 7th oldest higher education institution in the United States. In 2022, the enterprise computing team came together with the HPC team to build something that most schools would dream of. By aggregating their data into one scalable and multi-tenant platform that has now scaled to hold over 10 petabytes of research data, they found a variety of powerful benefits.

Before I cover the benefits, let me explain why this is possible.

  1. VAST’s parallel disaggregated architecture allows for performance isolation, pools of VAST servers create QOS domains by isolating traffic to /scratch or /home on CPU boundaries. Forever, it’s been crazy to think that interactive I/O and batch I/O could be served by a common system - but this is only caused by the internal storage communication that happens in partitioned systems. VAST’s new DASE architecture has no notion of partitions, rather… large collections of cores running VAST SW all see the same global volume in parallel, eliminating the need for them to talk with each other. The result? Every I/O is handled in parallel, no east-west traffic.

  2. CNode pools also allow for Brown to home a data platform to two distinct physical networks without needing to put all of their storage on one network. With VAST, one system can talk to both networks simultaneously, and leverage RDMA access when the network calls for it.

  3. Next, it’s important to point out that VAST is building a system for enterprise-grade data management and data governance. All data is end-to-end encrypted, the system can support the provisioning of independent tenants that can be securely erased, and every user action and admin action is audited to meet Brown’s HIPAA requirements. In fact, it was Brown that convinced us to add audit trails. In our next release, these will be all captured by the VAST DataBase - in essence providing an embedded SIEM within a data platform.

Super-Awesome Use Case #1 

Performance Amortization
images

Yesterday, Brown would buy high capacity systems for enterprise data, and smaller and faster systems for HPC. Now, by aggregating data into one shared cluster, the performance that is unused by the enterprise team can be immediately harvested to make HPC clusters go faster. The law of data aggregation always works to provide systems with higher total peak bandwidth.

Super-Awesome Use Case #2

Data Copy Elimination

Now that the Enterprise Data is accessible via a CNode pool on an HPC network, users don’t have to copy input datasets from one system to another, as is otherwise common with multi-disciplinary HPC environments.

Super-Awesome Use Case #3 

Grant Readiness

With VAST’s secure multi-tenancy and Zero Trust data governance tools, Brown can now answer the needs of even the most stringent grants without needing to thick-provision independent systems for different projects. Everything is centrally manageable and performance remains amortized.

Super-Awesome Use Case #4 

Unified Data Governance

Under one data management framework, Brown can now see and govern all of their data. Tools like the VAST Data Catalog provides a high-speed mechanism to query user metadata - providing opportunities to set policies against datasets that enforce data retention and data lifecycle, protect data with no-overhead data snapshots, set encryption policy and more. With VAST, it’s easy to monitor data that allows researchers to conform to the data retention objectives enforced on grant projects.

Super-Awesome Use Case #5 

It just works

This is a refrain we hear from so many of our customers who have toiled doing HPC system management for too long. Post-docs are searching for more valuable ways to hone their talents, so a system that’s simple to manage and always-online is a nice luxury to have.

That’s the story. Here’s a quick video featuring Amar Jasti, Brown’s Principal Infrastructure Engineer, explaining the convergence of HPC and enterprise environments at the university.

Secure Research Data Enclaves is a topic we’re very passionate about. Reach out if you’d like to learn how organizations like Brown, TACC, University of Pisa, and many more of the world’s most prestigious research computing environments are reimagining enterprise data management in the age of zero-trust. - Jeff

More from this topic

Learn what VAST can do for you
Sign up for our newsletter and learn more about VAST or request a demo and see for yourself.

By proceeding you agree to the VAST Data Privacy Policy, and you consent to receive marketing communications. *Required field.