HACMP – High Availability Cluster Multiprocessing
Understand what High Availability is
Understand why you might need High Availability
Outline the various options for implementing High Availability
Compare and contrast the High Availability options
State the benefits of using High Availability clusters
So, What is High Availability?
The masking or elimination of both planned and unplanned downtime
The elimination of single points of failure (SPOFs).
Fault resilience (may go down but will come back up) , but Not fault tolerance (No downtime)
IBM’s HA Solution for AIX
High Availability Cluster Multiprocessing
Based on cluster technology
Consists of two components:
– High Availability, the process of ensuring an application is available for use through the use of duplicated and/or shared resources
– Cluster Multiprocessing, concurrent access to shared data.
A Highly Available Cluster
Clusters based upon HACMP can contain between 2 and 8 nodes, or 32 nodes when using HACMP/ES. A cluster comprised of physical components (Topology) and logical components. (Resources)
The Causes of Downtime
1. Planned Downtime
– Hardware upgrades
– Software updates
2. Unplanned Downtime
– User error
– Application failure
– Hardware faults
– Environmental disasters
1% – Hardware Failure
14% – Unplanned Downtime
85% – Planned Downtime
High Availability Solutions should reduce both planned and unplanned downtime.
Just What Does HACMP Do?
HACMP will detect and act upon three failures by design: others can be added by customization.
– A node failure
– A network adapter failure
– A network failure
HACMP/ES can also monitor on applications, processor load, and available disk capacity.
When a Nod Fails, What Happens?
HACMP has three modes of configuration, known as:
– Cascading – Failover, Failback down times
– Rotating – Standby
– Concurrent (Not as common)
Benefits of High Availability Solutions:
High Availability Solutions offer the following benefits
– Standard Components (no specialized hardware)
– Can be built from existing hardware (no need to invest in new kit)
– Works with just about any application
– Works with wide range of disk and network types
– No specialized operating system or microcode
– Excellent availability at low cost
Other Considerations for High Availability
High Availability solutions require the following:
– Thorough design and detailed planning
– Selection of appropriate hardware
– Disciplined system administration practices
– Documented operational procedures
– Comprehensive testing
AIX’s Contribution to High Availability
– Object Data Manager (ODM)
– System Resource Controller (SCR)
– Logical Volume Manager (LVM)
– Journalized File System (JFS)
– Online JFS Backup (splitlvcopy)
– Work Load Manager (WLM)
– Quality of Service (Qos)
– External Boot
– Software Installation Management (installp)
– Reliable Scalable Cluster Technology (RSCT)
All pSeries systems will work with high availability, in any combination of nodes within a cluster, however a minimum of 4 free adapter slots is recommended.
Supported Storage Environments
Most HACMP clusters require shared disk. Disk technologies that support multi-host connections include; SCSI, SSA and FC-AL (with or without RAID).
Supported IP networks include; Ethernet (10Mb, 100Mb, 1Gb), Token-Ring, RDDI, ATM, Fibre Channel,SLIP and the SP switches. Supported non-IP networks include, RS232/422, Target Mode SSA and Target Mode SCSI.
Some Assembly Required
HACMP is not an “out of the box” solution. HACP’s flexibility allows for complex customization in order to meet availability goals.
Customized Pre-event script
HACMP core events
Customized post-even scripts
IBM’s High Availability Product Family
HACMP is not an orphan but a member of the High Availability family that includes; HAGEO, HACWS and HACMP/ES.
High Available CWS – Control Work Station
HACMP is a mature product evolving to meet customer’s needs. Some of the key features/changes have been:
1. HACMP version 4.2.2
–Fast failover, DARE emulation, DARE migration, Clverify Methods, Enhancements to HAView, AIX version 4.3 support
2. HACMP version 4.3.0
– 32 node support for HACMP/ES, C-SPOC enhancements, ATM support, Taskguides, Mulitple pre and post event scripts, FDDI MAC takeover.
3. HACMP version 4.3.1
– Monitoring and administration support enhancements, node by node migration, AIX fast connect support.
4. HACMP version 4.4.0
– Integration with Tivoli, application monitoring, cascading without fallback, C-SPOC enhancements, imporved task guides, imporved migration support, integration of HANFS functionality, soft copy documentation (html and pdf).
What is Shared Storage?
In order to pass control of data between two or more systems we must have the ability to connect a storage subsystem to more than one computer at a time. This is commonly called “twin tailing”. There are several methods of twin tailing using SCSI, SSA or Fibre Channel storage units.
SCSI Tehcnology and HACMP
HACMP -related issues with SCSI disk architecture:
– SCSI buses require termination at each end. In HACMP environments the terminators have to be external to facilitate removing a SCSI cable without losing termination on the bus.
-SCSI buses are ID-based; all devices must have a unique ID number; the default for all SCSI adapters is ID 7; this must be avoided.
SSA Technology and HACMP
SSA loops may be configured for maximum availability:
An Example: One loop and four adapters
Tools to Help You Plan Your Cluster
– Builds hardware and software configuration files
– Only available to IBMers and IBM BPs
– Downloaded from ehone.ibm.com
– This tool will not tell you if the hardware and software components you have chosen will address the business problem at hand.
Use the Online Planning Worksheets
– These are a PC-based tool that runs on Windows 95/98/NT under Microsoft Internet Explorer
– These create a cluster snapshot which can be applied to HACMP
HACMP for AIX Planning Guide (SC23-4277-02 for version 4.4)
– Use the plannig worksheet that are in the Appendix A.
Draw a diagram of your cluster
– Create a pictorial representation of your cluster
– Show how the clients connect (including routers).
– Label all cluster components
– Keep the diagram up to date if your cluster changes