VAST Field CTO Offers Five Key Platform Differentiators for Analytics-Driven Organizations
We’re very pleased to have received the highest rank for the Analytics use case in the Gartner 2022 Critical Capabilities for Distributed File Systems and Object Storage report.
Critical Capabilities is a companion report to the recently published Gartner Magic Quadrant™ for Distributed File Systems and Object Storage, in which VAST Data was named a Challenger. In this research Gartner evaluate 19 file and object storage products against 10 critical capabilities in seven use cases important to I&O leaders.
According to the report, VAST received the highest product score (scoring 4.14 out of 5) among all vendors evaluated in the Analytics Use Case. Gartner states: “This use case refers to storage consumed by big data analytics applications and packaged business intelligence (BI) applications to address domain or business problems.”
I thought it would be useful and timely to share five key VAST differentiators for data analytics that we think separate VAST from competitive offerings.
- All-Flash Performance: VAST is a unified data platform that democratizes fast, all-flash access to all data. The platform presents itself to applications and users as both an object store and a NAS system. Structured and unstructured data can be ingested from a wide variety of applications and sources into a single unified repository for analysis and processing. As an all-flash platform VAST performs extremely well, regardless of whether the access pattern is sequential or random. It can also handle large amounts of concurrency from multiple applications simultaneously.
- Multi-Protocol Data Access: VAST’s object (S3) interface allows for integration with big data analytics tool sets such as MapReduce, HIVE and Spark. Additionally, it supports distributed query engines (i.e. Dremio, Trino, Presto) and GPU-accelerated data warehouses (i.e. SQream). Because VAST can present the namespace using standard NFSv3 and NFSv4, the system also supports traditional analytics applications and databases such as SAS Grid, Postgres and KDB. Teradata, Splunk, ElasticSearch, and Vertica can integrate using either S3 or NFS.
Furthermore, because VAST also supports the Windows file sharing protocol, analysts and other users can easily access reports and output data from their workstations without necessitating a copy to a separate storage system or acquiring special tools or skills, as is required when interacting with systems based upon HDFS. This means that VAST is suitable for a variety of uses, including:
- Legacy data warehouse-style analytics applications
- The intermediate layer of data processing within a data lake
- ETL & ELT
- Ad hoc analytics using tools such as Presto, Trino, Impala, and Dremio
- Emerging AI/ML tools
- GPU Direct Storage Support: VAST brings an innovative approach to AI/ML/DL through increasing GPU activity and efficiency. As GPU’s become more prevalent with both AI and analytics, it is important to ensure that GPUs can not only be fed large amounts of data quickly, but also can process that data as efficiently as possible. Using GPU direct storage allows for data that comes from VAST to pass directly from the network card in the GPU server into GPU memory. In other words, allowing the data to bypass system memory and the CPU complex. This results in higher bandwidth, as well as elimination of GPU overhead for I/O processing.
The end result is higher efficiency and the ability to run more analytics and queries on a smaller number of GPUs. For more information on GPU Direct Storage, see https://vastdata.com/scale-out-solutions/nvidia-gpus/
- World-Leading Data Reduction: VAST provides best-in-class data reduction through our Similarity-based data algorithms. For structured or semi-structured data in a comma-separated or tab-separated values format, VAST delivers between 3:1 and 4:1 data reduction without any compromise on performance. Ingested data consumes less capacity when compared with other platforms. Even when data has been pre-compressed, as is the case with formats such as Parquet, we provide extra savings of 1.45:1. This may not seem like much, but a 45% reduction in storage consumption is significant in today’s cost-conscious environment.
- A Modern Architecture for Modern Workloads: The VAST uses a approach to storage architecture. This unique and highly scalable approach enables customers to scale storage capacity independently of storage performance. They don’t need to maintain a fixed ratio for capacity and performance, and can adjust the system as their needs change over time.
The performance of a VAST system is not constrained by the storage media, which consists of 100% flash. Instead, performance is determined by the compute layer that addresses and exposes this storage to client-facing applications. Because we’ve separated compute from storage, customers can linearly scale performance just by adding CPU. They can also allocate it in such a way as to give each application its own dedicated slice, as seen in the figure below:
Because the platform is stateless, it also means customers can change their minds. Today, they may have 10 of these nodes in the batch pool, but some days, they might want to have some of them in the data science pool. Users simply click in the UI or make an API call and they move over, with no disruption to users or applications. Because this occurs at the stateless server layer, no movement of data has to occur.
Our clients are using more of their data to drive real-time decision making through analytics. Let me know if you’d like to discuss how to increase agility and speed while reducing the expense to access all your data and models.
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Gartner, 2022 Critical Capabilities for Distributed File Systems and Object Storage, Julia Palmer, Jerry Rozeman, Chandra Mukhyala, Jeff Vogel, October 26, 2022.