Snap to S3 – Bringing Backup to the Cloud Era

Authored by

Howard Marks

As we started building multi-petabyteuniv systems, it became clear that our new approach to storage was going to require a new approach to backup. With VAST 3.0 we’re addressing that requirement with a Snap to S3 feature that will replicate VAST snapshot data to an S3 compatible object store, in the cloud, on premises, or an S3 bucket on a VAST Data Platform system in another data center.

Before we get into the details of how Snap to S3 works, let’s look at the more conventional solutions, and how they break down in the era of multi-petabyte namespaces.

NDMP – To Old School for the PB Era

The venerable NDMP protocol was released 25 years ago so NASes could backup directly to Fibre Channel attached tape drives.

Like any system that backs up to tape, NDMP is based on the model of periodic full backups, with multiple incremental backups in between. This works for tape backups because it limits the number of incremental backups that admins have to process in turn during a restore.

As datasets, and the name spaces that held them have grown from a few GB to a few PB those periodic full backups have grown more and more onerous. Sending 1 PB of data offsite over a 100Gbps connection takes a whole day, and the full 100Gbps. Users don’t want to pay for a 100Gbps Internet connection they only use 1 day a month.

While NDMP is a standard NAS backup protocol it’s just too stuck in 20th century backup practices to be useful for multipetabyte namespaces.

Incremental Forever can walk the file system forever

As we all shifted from backing up our files directly to tape to backing up to disk-based storage systems backup software got smarter with incremental forever backups. This automated restoring to a point in time and pruning stale data from the backup set without the overhead of a periodic full backup.

Incremental Forever systems eliminate the periodic full backups, and the huge bandwidth spikes they create, but determining which files need to be backup up during an incremental backup cycle typically requires the backup software to read some metadata attribute, typically last modified time or an archive bit, of every file to see if it’s been changed since the last cycle.

The problem is that scanning the millions, or billons of files and objects in namespace both takes a long time and presents a real load on the storage system processing metadata requests. It would be nice if a backup server could query a namespace’s metadata for a list of files changed since some time but today backup software has to walk most filesystems/namespaces.

Replication to the Rescue – At the cost of Like to Like

Once a file system has grown too big, or contains too many files, for backup software to handle most users turn to point-in-time, asynchronous replication taking, and replicating snapshots from their primary NAS to a secondary NAS in a second data center. Since the NAS tracks which data changes at a sub-file level snapshot based replication will require less bandwidth than incremental backups of whole files.

The problem with asynchronous replication is that since snapshot formats are proprietary NAS systems can only replicate to another NAS from the same family. This works for customers who want to rapidly restore their applications in a second data center but forces customers who are more interested in protecting their data into buying similar storage for their backups.

We at VAST thought that we would implement this type of VAST to VAST replication next on our roadmap but when we spoke to our early customers they were more interested in a solution to the backup problem. We revised the roadmap and Snap to Cloud is our first replication solution.

Snap To S3

As its name suggests the Snap to S3 feature takes a periodic snapshot, of a folder and it's contents, packages the blocks that changed in the protected data into large compressed objects and then writes those objects to the S3 compatible object store creating an independent backup of the data.

Users on the originating VAST system can read data from the remote snapshots through a

.\remote

system folder the same way they access local snapshot data through the

./snapshot

folder. Remote snapshots run on an independent schedule, and retention policy from local snapshots.

A customer could take hourly local snapshots that they retain for a week, and daily remote snapshots with retention of a year. Users can perform self service restores of data deleted less than a week ago from the local snapshots and from the remote snaps for older data.

The objects a VAST system writes to the object store are entirely self-describing including all the element store metadata. This allows any VAST Data Platform system to attach to the S3 bucket holding remote snapshots from another VAST Storage system, and after providing the requisite secret handshakes and keys mount the snapshots as a

./remote

folder.

The VAST Reader

The other advantage of making the snapshot S3 objects self-describing is it allowed us to build the VAST Reader, a VM that can mount the snapshot objects when, or where like the public cloud, a VAST system isn’t available. Like a VAST system the Reader VM mounts the snapshot S3 objects and presents them as a read-only NFS file system.

The reader VM is initially packaged as an OVF for on-premises disaster recovery and as an Amazon Web Services AMI so users can extract data from cloud backups and process it in the cloud avoiding egress charges.

The Cloud is Our Backup

Snap to Object solves the multi-petabyte backup problem better than any conventional solution. Since it’s integrated with VAST’s snapshot technology it eliminates the time, and load, of walking the file system for changed files and the