Level set Requirements



Yüklə 510 b.
tarix02.03.2018
ölçüsü510 b.
#43706





Level set

  • Level set

  • Requirements

  • Concepts

  • Configurations

  • Sample Scenarios

  • Use Cases

  • Summary





Identify clearing and settlement activities in support of critical financial markets

  • Identify clearing and settlement activities in support of critical financial markets

  • Determine appropriate recovery and resumption objectives for clearing and settlement activities in support of critical markets

    • core clearing and settlement organizations should develop the capacity to recover and resume clearing and settlement activities within the business day on which the disruption occurs with the overall goal of achieving recovery and resumption within two hours after an event
  • Maintain sufficient geographically dispersed resources to meet recovery and resumption objectives

    • Back-up arrangements should be as far away from the primary site as necessary to avoid being subject to the same set of risks as the primary location
    • The effectiveness of back-up arrangements in recovering from a wide-scale disruption should be confirmed through testing
  • Routinely use or test recovery and resumption arrangements

    • One of the lessons learned from September 11 is that testing of business recovery arrangements should be expanded


Ensuring Business Continuity:

  • Ensuring Business Continuity:

  • Disaster Recovery

    • Restore business after an unplanned outage
  • High Availability

    • Meet Service Availability objectives, e.g., 99.9% availability or 8.8 hours of down-time a year
  • Continuous Availability

    • No downtime (planned or not)




Shift focus from failover model to near-continuous availability model (RTO near zero)

  • Shift focus from failover model to near-continuous availability model (RTO near zero)

  • Access data from any site (unlimited distance between sites)

  • Multi-sysplex, multi-platform solution

    • “Recover my business rather than my platform technology”
  • Ensure successful recovery via automated processes (similar to GDPS technology today)

    • Can be handled by less-skilled operators
  • Provide workload distribution between sites (route around failed sites, dynamically select sites based on ability of site to handle additional workload)

  • Provide application level granularity

    • Some workloads may require immediate access from every site, other workloads may only need to update other sites every 24 hours (less critical data)
    • Current solutions employ an all-or-nothing approach (complete disk mirroring, requiring extra network capacity)








Configurations

  • Configurations

    • Active/Standby – GA date 30th June 2011
    • Active/Query – GA date 31st October 2013
  • A configuration is specified on a workload basis

  • A workload is the aggregation of these components

    • Software: user written applications (eg: COBOL programs) and the middleware run time environment (eg: CICS regions, InfoSphere Replication Server instances and DB2 subsystems)
    • Data: related set of objects that must preserve transactional consistency and optionally referential integrity constraints (such as DB2 Tables, IMS Databases and VSAM Files)
    • Network connectivity: one or more TCP/IP addresses & ports (eg: 10.10.10.1:80)






Two Production Sysplex environments (also referred to as sites) in different locations

  • Two Production Sysplex environments (also referred to as sites) in different locations

    • One active, one standby – for each defined update workload, and potential query workload active in both sites
    • Software-based replication between the two sysplexes/sites
      • DB2, IMS and VSAM data is supported
  • Two Controller Systems

    • Primary/Backup
    • Typically one in each of the production locations, but there is no requirement that they are co-located in this way
  • Workload balancing/routing switches

    • Must be Server/Application State Protocol compliant (SASP)
      • RFC4678 describes SASP
    • What switches/routers are SASP-compliant? … the following are those we know about
      • Cisco Catalyst 6500 Series Switch Content Switching Module
      • F5 Big IP Switch
      • Citrix NetScaler Appliance
      • Radware Alteon Application Switch (bought Nortel appliance line) 




GDPS/Active-Active

  • GDPS/Active-Active

  • IBM Tivoli NetView Monitoring for GDPS, which pre-reqs:

    • IBM Tivoli NetView for z/OS
      • IBM Tivoli NetView for z/OS Enterprise Management Agent (NetView agent) – separate orderable
  • System Automation for z/OS

  • IBM Multi-site Workload Lifeline for z/OS

  • Middleware – DB2, IMS, CICS…

  • Replication Software

    • IBM InfoSphere Data Replication for DB2 for z/OS (IIDR for DB2)
    • IBM InfoSphere Data Replication for IMS for z/OS (IIDR for IMS)
    • IBM InfoSphere Data Replication for VSAM for z/OS (IIDR for VSAM)
  • Optionally the Tivoli OMEGAMON XE monitoring products

    • Individually or part of a suite


All components of a Workload should be defined in SA* as

  • All components of a Workload should be defined in SA* as

    • One or more Application Groups (APG)
    • Individual Applications (APL)
  • The Workload itself is defined as an Application Group

  • SA z/OS keeps track of the individual members of the Workload's APG and reports a “compound” status to the A/A Controller



Certain components of a Workload, for instance DB2, could be also viewed as “infrastructure”

  • Certain components of a Workload, for instance DB2, could be also viewed as “infrastructure”

  • Relationship(s) from the Workload ensure that the supporting “infrastructure” resources are available when needed

  • Infrastructure is typically started at IPL time



Shared members

  • Shared members

  • Other components of a Workload, for instance, capture and apply engines can also be shared

  • However, GDPS requires that they are members of the Workload

  • Rationale

  • The A/A Controller needs to know the capture and apply engines that belong to a Workload in order to

    • Quiesce work properly including replication
    • Send commands to them
















Provide DR for whole production sysplex (AA workloads & non-A/A workloads)

  • Provide DR for whole production sysplex (AA workloads & non-A/A workloads)

  • Restore A/A Sites capability for A/A Sites workloads after a planned or unplanned region switch

  • Restart batch workloads after the prime site is restarted and re-synced

  • The disk replication integration is optional







Option 1 – create new sysplex environments for active/active workloads

  • Option 1 – create new sysplex environments for active/active workloads

    • Simplifies operations as scope of Active/Active environment is confined to just this or these specific workloads and the Active/Active managed data
  • Option 2 – Active/Active workload and traditional workload co-exist within the same sysplex

    • Still will need new active sysplex for the second site
    • Increased complexity to manage recovery of Active/Active workload to one place, and remaining systems to a different environment, from within the same sysplex
    • Existing GDPS/PPRC customer will have to implement GDPS co-operation support between GDPS/PPRC and GDPS/Active-Active


Active /Query configuration

  • Active /Query configuration

    • Fulfills SoD made when the Active/Standby configuration was announced
  • VSAM Replication support

    • Adds to IMS and DB2 as the data types supported
    • Requires either CICS TS V5 for CICS/VSAM applications or CICS VR V5 for logging of non-CICS workloads
  • Support for IIDR for DB2 (Qrep) Multiple Consistency Groups

    • Enables support for massive replication scalability
  • Workload switch automation

    • Avoids manual checking for replication updates having drained as part of the switch process
  • GDPS/PPRC Co-operation support

    • Enables GDPS/PPRC and GDPS/A-A to coexist without issues over who manages the systems
  • Disk replication integration

    • Provides tight integration with GDPS/MGM for GDPS/A-A to be able to manage disaster recovery for the entire sysplex




Large Chinese financial institution

  • Large Chinese financial institution

  • Several critical workloads

    • Self-services (ATMs)
    • Internet banking
    • Internet banking (query-only)
  • Workloads access data from DB2 tables through CICS

  • Planned outages

    • Minor application upgrades (as needed)
      • Often included DB2 table schema changes
    • Quarterly application version upgrades
      • Other planned maintenance activities such as software infrastructure


Critical workloads were down for three to four hours

  • Critical workloads were down for three to four hours

    • Scheduled for 3rd shifts local time on weekends to limit impact to banking customers
      • Still affected customers accessing accounts from other world-wide locations
    • Site taken down for application upgrades, possible database schema changes, scheduled maintenance
      • All business stopped
      • Required manual coordination across geographic locations to block and resume routing of connections into data center
      • Reload of DB2 data elongated outage period
  • Goal was to reduce planned outage time for these workloads down to minutes



Solution provides

  • Solution provides

    • A transactional consistent copy of DB2 on a remote site
      • IBM InfoSphere Data Replication for DB2 for z/OS (IIDR) - provides a high-performance replication solution for DB2
    • A method to easily switch selected workloads to a remote site without any application changes
      • IBM Multi-site Workload Lifeline (Lifeline) - facilitates planned outages by rerouting workloads from one site to another without disruption to users
    • A centralized point of control to manage the graceful switch
      • GDPS Active/Active Sites - coordinates interactions between IIDR and Lifeline to enable a non-disruptive switch of workloads without loss of data
  • Reduced impact to their banking customers!

    • Total outage time for update workloads was reduced from 3-4 hours down to about 2 minutes
    • Total outage time for the query workload was reduced from 3-4 hours down to under 2 minutes












IBM Multi-site Workload Lifeline v2.0

  • IBM Multi-site Workload Lifeline v2.0

    • Advisor – runs on the Controllers & provides information to the external load balancers on where to send connections and information to GDPS on the health of the environment
      • There is one primary and one secondary advisor
    • Agent – runs on all production images with active/active workloads defined and provide information to the Lifeline Advisor on the health of that system
  • IBM Tivoli NetView Monitoring for GDPS v6.2 or higher

    • Runs on all systems and provides automation and monitoring functions. This new product pre-reqs IBM Tivoli NetView for z/OS at the same version/release. The NetView Enterprise Master runs on the Primary Controller
  • IBM Tivoli Monitoring v6.3 FP1

    • Can run on zLinux, or distributed servers – provides monitoring infrastructure and portal plus alerting/situation management via Tivoli Enterprise Portal, Tivoli Enterprise Portal Server and Tivoli Enterprise Monitoring Server
    • If running NetView Monitoring for GDPS v6.2.1 and NetView for z/OS v6.2.1, ITM v6.3 FP3 is required.


IBM InfoSphere Data Replication for DB2 for z/OS v10.2

  • IBM InfoSphere Data Replication for DB2 for z/OS v10.2

    • Runs on production images where required to capture (active) and apply (standby) data updates for DB2 data. Relies on MQ as the data transport mechanism (QREP)
  • IBM InfoSphere Data Replication for IMS for z/OS v11.1

    • Runs on production images where required to capture (active) and apply (standby) data updates for IMS data. Relies on TCPIP as the data transport mechanism
  • IBM Infosphere Data Replication for VSAM for z/OS v11.1

    • Runs on production images where required to capture (active) and apply (standby) data updates for VSAM data. Relies on TCP/IP as data transport mechanism. Requires CICS TS or CICS VR
  • System Automation for z/OS v3.4 or higher

    • Runs on all images. Provides a number of critical functions:
      • BCPii for GDPS
      • Remote communications capability to enable GDPS to manage sysplexes from outside the sysplex
      • System Automation infrastructure for workload and server management
  • Optionally the OMEGAMON XE products can provide additional insight to underlying components for Active/Active Sites, such as z/OS, DB2, IMS, the network, and storage

    • There are 2 “suite” offerings that include the OMEGAMON XE products (OMEGAMON Performance Management Suite and Service Management Suite for z/OS).


Active/Active Sites

  • Active/Active Sites

    • This is the overall concept of the shift from a failover model to a continuous availability model.
    • Often used to describe the overall solution, rather than any specific product within the solution.
  • GDPS/Active-Active

    • The name of the GDPS product which provides, along with the other products that make up the solution, the capabilities mentioned in this presentation such as workload, replication and routing management and so on.


Update Workloads

  • Update Workloads

    • Currently only run in what is defined as an active/standby configuration
    • performing updates to the data associated with the workload, and
    • has a relationship with the data replication component
    • not all transactions within this workload will necessarily be update transactions
  • Query Workloads

    • Run in what is defined as an active/query configuration
    • must not perform any updates to the data associated with the workload
    • allows the query workload to run, or could be said to be active, in both sites at the same time
    • a query workload must be associated with an update workload


A Consistency Group (CG) corresponds to a set of DB2 tables for which the replication apply process maintains transactional consistency - by applying data-dependent transactions serially, and other transactions in parallel

  • A Consistency Group (CG) corresponds to a set of DB2 tables for which the replication apply process maintains transactional consistency - by applying data-dependent transactions serially, and other transactions in parallel

  • Multiple Consistency Groups (MCGs) are primarily used to provide scalability

    • if and when one CG (Single Consistency Group) cannot keep up with all transactions for one workload
    • query workloads can tolerate data replicated with eventual consistency
  • Q Replication (V10.2.1) can coordinate the Apply programs across CGs to guarantee that a time-consistent point across all CGs can be established at the standby site, following a disaster or outage, before switching workloads to this standby side

  • GDPS operations on a workload controls and coordinates replication for all CGs that belong to this workload

    • For example, 'STOP REPLICATION' for a workload, stops replication in a coordinated manner for all CGs (all queues and Capture/Apply programs)
    • GDPS supports up to 20 consistency groups for each workload










A workload is the aggregation of these components

  • A workload is the aggregation of these components

    • Software: user written applications (eg: COBOL programs) and the middleware run time environment (eg: CICS regions, InfoSphere Replication Server instances and DB2 subsystems)
    • Data: related set of objects that must preserve transactional consistency and optionally referential integrity constraints (eg: DB2 Tables, IMS Databases, VSAM Files)
    • Network connectivity: one or more TCP/IP addresses and ports (eg: 10.10.10.1:80)


In DB2 Replication, the mapping between a table at the source and a table at the target is called a subscription

  • In DB2 Replication, the mapping between a table at the source and a table at the target is called a subscription

    • Example shows 2 subscriptions for tables T1 and T2
  • A subscription belongs to a QMap which defines the sendq that is used to send data for that subscription

    • Example shows that both subscriptions are using the same QMap (SQ1)
  • In IMS Replication, a subscription is a combination of a source server and a target server

    • The subscription is the object that is started/stopped by GDPS/A-A.
    • This corresponds to the QMap in Q Replication
  • Each IMS Replication subscription contains a list of replication mappings

    • There is one replication mapping for each IMS database being replicated
    • This corresponds to a subscription in Q Replication




Automation code is an extension on many of the techniques tried and tested in other GDPS products and with many client environments for management of their mainframe CA & DR requirements

  • Automation code is an extension on many of the techniques tried and tested in other GDPS products and with many client environments for management of their mainframe CA & DR requirements

  • Control code only runs on Controller systems

  • Workload management - start/stop components of a workload in a given Sysplex

  • Software Replication management - start/stop replication for a given workload between sites

  • Disk Replication management – ability to manipulate GDPS/MGM from GDPS/A-A

  • Routing management - start/stop routing of connections to a site

  • System and Server management - STOP (graceful shutdown) of a system, LOAD, RESET, ACTIVATE, DEACTIVATE the LPAR for a system, and capacity on demand actions such as CBU/OOCoD

  • Monitoring the environment and alerting for unexpected situations

  • Planned/Unplanned situation management and control - planned or unplanned site or workload switches; automatic actions such as automatic workload switch (policy dependent)

  • Powerful scripting capability for complex/compound scenario automation



Yüklə 510 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin