Enterprise Storage Diamond Architecture for Mission Critical Systems

Over past three years, PowerM received many inquiries for “Active-Active with a DR site”solution requirement, e.g.: 7 x 24 Serviceability; Multiple Tier of Data Backup; Multiple Data Centers Built; Better Utilization of  Data Center; Enterprise Data Center Switch-over requirements etc…

PowerM Solution : Enterprise Storage Diamond Architecture for Mission Critical Systems based on IBM PowerHA SystemMirror, HyperSwap for AIX and DS8000 MultipleTarget Peer to Peer Remote Copy

1. Evolving client requirement: 

A Paradigm Shift 1:

As enterprises have become more dependent on IT, their continuous availability, disaster recovery (DR) and business continuity requirements have become more demanding. Moroccan Government and Bank Al Magreb regulations are more stringent as well. To address today’s stringent requirements, customers need Active/Active (A/A) Sites solution. A/A Sites is two or more sites separated by extended distances(less than 100km in most cases) running the same applications and using the same data to provide cross-site workload balancing and continuous availability. It’s a fundamental paradigm shift from a failover model to a continuous availability model.

A Paradigm Shift 2:

Many companies and businesses require that their applications be continuously available and cannot tolerate any service interruption. Many consider a loss of a disaster recovery backup to be a severe impact to their business. If their local production site fails, swapping to a DR site allows applications to continue running. However, without another DR to act as a backup for disaster protection, many business applications are left unprotected.

2. Active-Active consideration factors for storage

Unclear definition of Active-Active : different clients and different vendors (IBM, EMC, HPE,…) had different understanding and explanation of Active-Active Solution.

In order to build an Active-Active solution, each tier of workload need to be considered. Since Active-Active design at Web Server, Java / Application Server could be achieved through load balancing mechanism to dispatch workload across multiple sites. The key of Active-Active solution design really come down to database server design (resolved by using features like Oracle Real Application Cluster and DB2 PureScale) and in some case the shared filesystem design.

Any Storage Active-Active solution need to deal with the following technical challenges:

  • Latency between Site : approx. 1ms per 100km round trip (comparable to disk latency)
  • Quorum / Tie-Breaker requirement : to prevent site isolation or impact of data integrity
  • Workload Consideration : the larger the portion of write activity (INSERT, UPDATE, DELETE), the more messages need to be sent and more disk writes

3. PowerHA HyperSwap solution : reference architecture

The IBM HyperSwap function is a high availability feature that provides dual-site, active-active access to a volume. HyperSwap functions are available on systems that can support more than one I/O group .HyperSwap volumes have a copy at one site and a copy at another site. Data that is written to the volume is automatically sent to both copies; if one site is no longer available, either site can provide access to the volume.

Benefit of HyperSwap:

  • Unplanned outages :Compute Node Failures (No Service interruption (with Active-Active middleware required)), Storage Failure: No Service interruption
  • Planned HyperSwap:Storage maintenance w/o service interruption, Storage migration w/o service interruption
  • Enterprise Data Center Takeover Solution to Secondary Site
  • Transparent to User Application
  • Multisite PowerHA cluster with continuous storage availability
  • Non-disruptive storage swap for application continuity in the event of one storage outage
  • Storage maintenance without application downtime
  • RPO = 0; RTO = seconds
  • Integration with PowerHA for Automatic Failover with max. resiliency
  • Derived from IBM System z HyperSwap Solution

HyperSwap is introduced as a facility of PowerHA SystemMirror for AIX Enterprise Edition in combination with select storage subsystems. This facility supports stretched cluster and linked cluster configurations.

3. Multiple-Target Peer-to-Peer Remote Copy

IBM Multiple Target Peer-to-Peer Remote Copy (Multiple Target PPRC) enhances a multisite disaster recovery environment by providing the capability to have two PPRC relationships on a single primary volume. This adds data protection because there is an additional remote site.

In other words, with Multiple Target PPRC, the same primary volume can now have more than one target, which enables data to be mirrored from a single primary site to two different target sites.

Figure 1: Multiple Target PPRC

Multiple Target PPRC provides the following enhancements:

  •  Mirrors data from a single local primary site to two remote secondary sites
  •  Increases capability and flexibility for disaster recovery solutions by using synchronous replication, asynchronous replication, combination of synchronous replication and asynchronous replication configurations
  •  Improves a cascaded Metro/Global Mirror (MGM) configuration and simplifies recovery procedures

4. IBM High End Enterprise Storage : DS8880

IBM DS8880 all-flash machine types offer the exceptional performance and economic benefits of flash storage while maintaining enterprise-class reliability.

IBM DS8880 offers three different all-flash machine types to support all the advanced functions and security capabilities that are characteristic of IBM enterprise storage:

  • Business class
  • Enterprise class
  • Analytic class

Figure 2: IBM FlashSystem Portfolio

The IBM DS8880 family offers three all-flash array models, DS8884F, DS8886F, and DS8888F, to meet the demand for higher speed storage. All-flash arrays deliver higher IOPS and bandwidth with lower power consumption to reduce the total cost of ownership as compared to hybrid or HDD-based solutions.

All models are based on the second generation of the “High-Performance Flash Enclosure” (HPFE Gen-2).

HPFE Gen-2

HPFE is a dedicated flash architecture with all-Custom Flash Hardware (CFH), which is specifically designed for the high IOps counts that take place when using flash. HPFE Gen-2 comes in paired enclosures of up to 48 flash cards, which as special hardware have the so-called “Microbays” in the rear.

Each Microbay, among other components, contains

  • a Flash RAID adapter, which is dual-core PowerPC-based and hence can process RAID parity, and speed and amount of I/O in a way that goes far beyond of what a usual Device Adapter could handle, and
  • a PCIe switch card. The switch card carries the signal forward from the Flash RAID adapters over via PCIe Gen-3 directly into both processor complexes of a DS8880F.
    The Flash RAID adapters go with 8 SAS connections per Microbay pair to a pair of the specific flash enclosures, that hold SAS expanders and the flash cards.

Figure 3: Flash Architecture: DS8880 Internal Topology with HPFE

5. Putting All Pieces Together :

With AIX and DS8000, AIX HyperSwap will deliver both a high availability and a disaster recovery solution in an MGM three-site solution. PowerHA will provide management for HyperSwap. CSM will manage initial multi-target MGM setup, failback after HyperSwap, and Site Swap to the disaster recovery site to deliver high availability and disaster recovery in the multi-target MGM PowerHA environment. The synergy of PowerHA, DS8000, and CSM deliver a seamless, end-to-end solution to meet increasing business demands where critical applications need to be available all the time, and the system needs to be fault tolerant.

Figure 4: IBM Storage Implementation for AIX-Hyperswap, Brocade SAN and DS8880 Storage

In order to provide 3 site support for AIX PowerHA HyperSwap customers, Copy Services Manager, AIX PowerHA and DS8000 worked together to provide solution allowing customers to setup PowerHA HyperSwap on the Storage1Storage2 pairs managed by Copy Services Manager. Using a Copy Services Manager Multi-target MM-GM session ,a customer can setup the replication with Copy Services Manager’s ease of use and then provide high availability for those volumes by using PowerHA HyperSwap. When a HyperSwap is triggered on the H1H2 pairs in the session, Copy Services Manager will recognize the event and move the relationship to Target Available allowing the customer to then issue the Start Storage2->Storage1 Storage2->Storage3 command to restart replication across all three sites again. With Copy Services Manager’s ease of use in managing a three site environment, and the high availability capabilities of PowerHA HyperSwap, a customer has a great solution for managing their replication.

6. Detailed reference architecture:

Need more information about this reference architecture? Send us email : refarchitect@powerm.ma

7. Reference:

  • IBM DS8870 Multiple Target Peer-to-Peer Remote Copy-  IBM Redpaper publication REDP-5151-00
  • Introducing and Implementing IBM FlashSystem, Volume 9000 IBM Redbook SG24-8273-02
  • System Storage DS8884  all-flash array storage system Product ID: 5331-984
  • IBM blog New High-End Flash Storage System — DS8880F —Jan 17 2017 -Peter Kimmel