Cosmologists worldwide use the insights gained from these simulations. The simulated data is made available for researchers to analyze and compare with real world data obtained from telescopes. This process helps scientists understand the fundamental gaps in our knowledge, such as the nature of dark matter and dark energy. “What makes us unique is the focus that we have on bespoke system design and designing for workloads, which allows us to be incredibly cost effective for doing a particular science. So our cost per science output is very low across DiRAC as a whole,” notes Dr. Basden.
However, managing and storing the tremendous amounts of data generated by these simulations can be a significant challenge. As well as bulk data stored on a parallel file system, there is also a requirement for application space, configuration files, code, git repositories and visualisation outputs. Durham University’s previous solution for these mixed file types was based on an aging NFS server, which was struggling to meet the requirements. Dr. Basden notes, “The previous way of doing things didn’t scale well. It was a single NFS server with a single namespace, which didn’t scale to meet our needs. Whereas with the VAST Data Platform, we can add drive nodes to give more data or add compute nodes to give more parallelism, and it scales very well.”
A Transformative Partnership and Data Platform
Faced with the need for a next-generation solution, the ICC at Durham University chose the VAST Data Platform. The choice wasn’t just about finding a replacement, it was about making a transformation. The decision to adopt VAST Data was driven by several factors, including the platform’s; ability to handle small file operations efficiently, cost savings from advanced data reduction techniques, excellent analytics, and the global namespace capability, which should allow seamless collaboration with other UK sites.
Dr. Basden highlights the benefits of VAST Data: “It provides much higher performance, so users waste less of their compute time reading the data onto the system. Its ability to handle large numbers of small files efficiently also allows us to encourage users to use virtual environments more, particularly within Python and the SPACK package management tool, which traditionally haven’t been great - we have had to limit the total number of files per user previously to avoid significant performance reductions. With the VAST Data Platform, this is no longer the case. We can now encourage users to make the most of the system to help their workflows and set up their bespoke environments.”
Since implementing VAST Data, the COSMA system at Durham University has achieved a 3.4-to-1 data reduction ratio on their user data. Thanks to the massively increased network bandwidth and the VAST Disaggregated Shared Everything Architecture (DASE), they’ve seen significant performance improvements compared to the previous solution.