Our goal at VAST Data has always been to drive the cost of all-flash storage down to the point where customers can afford to free themselves from the obvious disadvantages of hard drives. So today, we’re taking the next step along this path by supporting the latest 30 TB NVMe SSDs doubling the capacity, and thereby the density, of our VAST Enclosure (AKA DBOX) to 1350TB of raw flash capacity.
Not only do 30 TB SSDs exceed the 20 TB maximum capacity of a hard drive, but each SSD also fits in the space of a 2.5” drive while those 20TB drives are in the 3.5” large form factor. Providing 1.3 PB of HDD capacity takes 68 hard drives and 4-5 rack units, whereas the same can be done with a 2U appliance composed of only 44 flash drives.
Hyperscale SSDs drive the cost of running all-flash storage down in several ways: Most obviously, bigger SSDs amortize the cost of the enclosure, from sheet metal and power supplies to the CPU canisters, PCIe switches, and NICs that connect the SSDs to the NVMe fabric. Less obviously by providing twice the capacity in the same 2 rack units and the same power consumption cuts data center costs for space, power and cooling. At a time when system architects are juggling power in their data center to feed modern processors, this new level of efficiency is much-needed relief.
Big SSDs Need A New Architecture
The other, perhaps more significant challenge with 30 TB SSDs is the “blast radius” or, more accurately, the sheer quantity of data lost when a drive fails. If a 30 TB SSD fails in a conventional RAID system, the process of rebuilding that SSD would quickly bottleneck on the “hot-spare” target SSD. This bottleneck would cause the rebuild to take days, or more likely weeks, during which your data is vulnerable to data loss in the event of additional failures.
That drive rebuild time problem and the exposure to data loss that it creates also applies to shared-nothing systems like Dell/EMC’s PowerScale, where the system has to exchange 500 TB, which clogs the backend network resulting in long rebuild times. Node failures are even worse as the system has to exchange 12 PB between the surviving nodes. In addition, PowerScale’s default data layout puts two strips of each erasure code stripe on each node while protecting against the loss of up to 2 strips. That makes many node rebuilds vulnerable to data loss as a node failure removes 2 strips from an erasure-code stripe that can only recover from 2 strip losses.
VAST systems are designed to provide the exceptional resilience that big drives demand. Our locally-decodable erasure codes (learn more here) protect your data from as many as four simultaneous SSD failures, and our rebuild-in-place architecture ensures that level of protection after the rebuild, even if failed SSDs aren’t replaced immediately.
VAST systems erasure-code across all the SSDs in a cluster, not just a small “neighborhood,” and parallelize any rebuild across all the VAST protocol servers (Cnodes) in the cluster to minimize rebuild times and the risk of data loss.
Disaggregation Creates Opportunity For Smarter Power Management
Many data centers are power, rather than space, constrained with limited total power or limited power per rack. Storage engineers have traditionally had to engineer their data centers so that the total peak power draw of the system in a rack never exceeded the power limit for that rack and tripped the circuit breaker. Sure some systems could spin down hard drives when they weren’t used, but that made the data on those drives unavailable or added several seconds of latency as they spun back up.
With these new high-density NVMe enclosures, VAST is also adding a new feature we call Universal Power Control that can manage the amount of power a cluster or a rack within a cluster consumes while still making data available at sub-millisecond latency.
The method behind Universal Power Control is pretty simple – and is only possible thanks to VAST’s Disaggregated, Shared-Everything Architecture. Just as hard drives draw more power when they’re moving their heads, SSDs draw much more power when accepting data writes than when processing reads. The write/erase cycle just uses more power than processing reads. With Universal Power Control, as the power drawn by the system reaches the power limit, the system starts reducing the number of active VAST Protocol Servers (aka CNodes), which reduces the rate at which the system writes data. Such an approach would be impossible with direct-attached shared-nothing storage architectures that tightly couple CPUs with storage devices.
Universal power management is just another example of the advantages VAST gains from disaggregating the compute functions of a storage system from the capacity. Customers can now build very dense all-flash archives or data lakes and gain all the benefits of fast reads, which don’t impact power consumption while managing their power footprint. Universal Power Controller allows customers to fit more storage in their power-limited data centers while controlling their power and data center space costs.
Comparing To The Competition
Our new 1350TB enclosure brings the VAST Data Platform to unprecedented levels of density, efficiency and cost. To prove our point, let’s compare a cluster of VAST 1350TB enclosures with clusters using a pair of our competitors: Pure Storage’s FlashBlade (one end of the spectrum) and Dell/EMC’s PowerScale using A300 archive nodes (the other end of the spectrum).
A 4U FlashBlade chassis holds 15 blades, each with 52 TB of flash delivering 535TB of usable capacity and consuming 1800W. FlashBlade supports compression – yet they feature no deduplication or anything close to VAST’s Similarity-Based Data Reduction. Since most unstructured data is already compressed when dealing with big data sets, let’s assume for this discussion they see no additional gain in effective capacity.
The PowerScale A300 nodes hold (qty) 15 x 16TB hard drives per node, with four nodes in a 4U chassis that consumes 1070W. Assuming 20% overhead for data protection, virtual hot spares, and the like, each chassis will deliver 768 TB of useable space. Dell supports both compression and deduplication but reading deduplicated data creates the kind of very random I/O that hard drives struggle with, so our comparisons assume no reduction for PowerScale. Since they have no Similarity Reduction, we will assume Dell’s limited approach to reducing data also provides no benefit for this comparison.
As you can see in the chart above the net result of a 1350TB enclosure, and VAST’s Similarity-Based Data Reduction is that VAST systems can provide all flash performance and a cost comparable to the much slower disk-based system while consuming a small fraction of the space and power of either FlashBlade, or PowerScale’s archive solution. With Universal Power Control, we can achieve a power density of 500W per PB. Now customers can put their biggest archives on VAST, and extract value from that archive at all-flash speeds without paying the flash tax of legacy flash approaches.