I thought it might be worth taking a quick look at async (snapshot) replication/mirroring which was released for XIV earlier this year with 10.2.0.a of the firmware. XIV async is similar in concept to Netapp’s async SnapMirror, both are snapshot based and both consume snapshot space as a part of the mirroring process. One difference of course is that with XIV both async and sync replication are included in the base price of the XIV, there is no added licence fee or maintenance fee to pay. I’d call it ‘free’ but I’d just get another bunch of people on twitter telling me they still haven’t received their free XIVs yet…
Features of XIV async mirroring include:
- 2-way asynchronous snapshot mirroring between two volumes, or between two consistency groups
- Both ad-hoc and scheduled updates (between 20 seconds and 24 hours) set per volume/CG
- One to many replication (up to 16 targets)
- Easy source/target role-reversal in the event of a disaster
- Concurrent synchronous and asynchronous mirroring processes with different targets
- Automatic de-stage deltas to disk if cache is full or the comms link goes down.
- Supports both FC and iSCSI links
The concept of RPO (recovery point objective ) is key to XIV async replication:
The XIV will set an update interval that is a third of the RPO you specify, rounded down to the nearest step.
This reflects the difference between a desired business outcome and a technical event, with the main fudge factor being the network. RPO is a business definition of how much data you can afford to lose in the event of a disaster. The update interval is a technical measure of how often the sync jobs will be initiated. These two things are different. The update frequency is also a discrete set of times: 20s, 30s, 1m, 2m, 5m, 10m, 15m, 30m, 1h, 2h, 3h, 6h, 8h, 12h, 24h.
- Choose an RPO of 1 minute and the update interval will be set to 20s.
- Choose an RPO of 1 hour and the update interval will be set to 15 minutes.
Another rule of thumb is that you generally need to allow at least 1 minute of RPO per 10TBs of volume. Because XIV async mirroring is RPO policy driven, it’s smart to set the RPOs based on the importance of any given volume. That way, if the network is constrained, the XIV can make intelligent decisions about how to allocate resources.
Don’t forget your network infrastructure:
You also need to have a reasonable amount of bandwidth. I have designed solutions around a whole variety of replication technologies in my time, and generally the biggest mistake people make is skimping on bandwidth. In most cases you will need to be thinking about 100Mbps as a starting point. If that sounds scary then probably large-scale remote site replication is not for you.
Oh and you will need decent QoS. Async replication is a key part of your DR strategy, don’t try to force it to live as a bottom feeder, dining only on the network scraps left over from other important apps like twitter and facebook. There is a sizing tool, which your friendly IBM guy or gal can use to apply some actual numbers to your situation. Also, you might want to check out http://sourceforge.net/projects/iperf/ in preparation, so you can really know what your network is doing.
So what’s the distance limit? Well 14,000 miles has been tested in the labs, which is about 25% further than from my house in New Zealand to Inverness in Scotland (so I’m all set for the day when Scottish independence is declared and the capital is moved north).
VMware Site Recovery Manager:
And if by the way you are planning on using SRM there is an installation guide on the VMware website here “vCenter Site Recovery Manager with XIV – Installation Guide“
Happy DR planning : )