1. Understand the role of problem determination
2. Provide methods for describing a problem and collecting the necessary information about the problem in order to take the best corrective course of action
Role of Problem Determination:
Providing methods for describing a problem and collecting the necessary information about the problem in order to take the best corrective course of action.
Before Problems Occur:
1. Effective problem determination starts with a good understanding of the system and its components.
2. The more information you have about the normal operation of a system, the better.
– System configuration
– Operating system level – oslevel command
– Applications installed
– Baseline performance
– Installation, configuration, and service manuals
Before Problems Occur: (A Few Good Commands):
lspv – lists physical volumes, PVID, VG membership
lscfg – provides information of system components
prtconf – displays system configuration information
lsvg – lists the volume groups
lsps – displays information about paging spaces
lsfs – gives file system information
lsdev – provides device information
Useful Commands and Options:
lspv hdisk1 – list physical volumes, volume group, pid
lspv -l hdisk1 – list logical volumes on physical volume
lspv -m hdisk1 – list a mapping of physical partitions to logical partitions
lscfg -v – list configuration, devices, manufacturers, part numbers ***
lsvg rootvg – list volume groups, PP size, VGID
lsvg -l rootvg – list logical volumes configured in volume group
lsvg -p rootvg – list physical volumes belong to the volume group
lsvg -o – list volume groups that are active on varyon
lsvg -o | lsvg -i -l – list active volume groups and their logical volumes
lsps -a – monitor paging space
lsfs -q – list file system and query each for information
lsdev -CH – list configured devices with headers
lsdev -Cc tape – list configured devices with class “tape”
lsdev -PH – list Predefined or supported devices that could be added
Problem Determination Techniques:
1. Identify the problem
2. Talk to users to define the problem
3. Collect system data
4. Resolve the problem
Identify the Problem:
A clear definition of the problem:
1. Gives clues as to the cause of the problem
2. Aids in the choice of troubleshooting methods to apply
Define the Problem (1 of 2):
Understand what the users of the system perceive the problem to be.
Users = data entry staff, programmers, system administrators, technical support personnel, management, application developers, operations staff, network users, etc.
Define the Problem (2 of 2):
– What is the problem?
– What is the system doing (or NOT doing)?
– How did you first notice the problem?
– When did it happen?
– Have any changes been made recently?
“Keep’em talking until the picture is clear?”
Collect System Data:
1. How is the machine configured?
2. What errors are being produced? – Use errpt command
3. What is the state of the OS?
4. Is there a system dump?
5. What log files exist?
Problem Determination Tools:
1. Error logs
2. LVM commands – chpv, chvg, chlv …
3. Diagnostics – CD or boot into diagnostics
4. LED codes
5. Bootable media – System not booting
6. Backups – Always have a good backup
7. System dump
Resolve the Problem:
1. Use the information gathered.
2. Use the tools available – commands documenation, downloadable fixes and updates.
3. Contact IBM Support, if necessary…
4. Keep a log of actions taken to correct the problem.
Obtaining Software Fixes and Microcode Updates:
Software fixes for AIX and hardware microcode updates are available on the Internet from the following URL:
Note: Access the Web site and register as a user.
1. AIX Operating System Publications – (contains LED meanings)
2. p-Series and RS/6000 System Installation and Service Guides
3. IBM Redbooks
IBM @Server pSeries Product Family:
150,000 + Customers
1,500,000 + Systems
Mid-range – p650
AIX 5L 5.2 Logical Partition Support (LPAR):
Improved throughput and resource utilization through increased workload management flexibility
1. Dynamic LPAR – Add or remove processors, adapters and memory without requiring a reboot.
2. Dynamic Reconfiguration APIs – Applications and middleware can automatically adjust to changes in hardware resources.
3. Dynamic Capacity Upgrade On Demand – Customers can activate additional processors without having to reboot.
4. Hot Sparing – Dynamic substitution of failed processors with spare. Unlicensed processors.
lsvg – list volume groups
lspv – list physical volumes
lsvg -l rootvg – list logical volume on rootvg
lsps -a – list activity on the paging space (hd6)
bootinfo -r – list total memory
bootinfo -p – hardware platform
bootinfo – z – multi-processor environment
lspv -l hdisk0 – list logical volumes on physical volume hdisk0