AIX 5L SysAdmin II: (Unit 10) – The AIX System Dump Facility

Unit Objectives:
———————–
1. Explain the meaning of the system dump
2. Determine and change the primary and secondary dump devices
3. Create a system dump
4. Execute the snap command
5. Use the kdb command to check a system dump

How a System Dump IS Invoked:
————————————————
Copies kernel data structure to a dump device
1. Via keyboard initiation
2. Via reset button
3. At unexpected system halt
4. Via command
5. Via SMIT

Note: 1&2 above – By default, with the system key in service

When A Dump Occurs:
———————————-
AIX Kernel – Crash!!!
Primary dump device – /dev/hd6
Next boot: Copy dump into…
Copy directory – /var/adm/ras/vmcore.0

The sysdumpdev Command:
—————————————–
# sysdumpdev -l – List dump values
primary: /dev/hd6
secondary: /dev/sysdumpnull
copy directory: /var/adm/ras
forced copy flag: TRUE
always allow dump: FALSE
dump compression: ON

# sysdumpdev -p /dev/sysdumpnull – Deactivate primary dump device (temporary)

# sysdumpdev -P -s /dev/rmt0 – Change secondary dump device (Permanent)

# sysdumpdev -L – Display information about last dump
Device name: /dev/hd6
Major device number: 10
Minor device number: 2
Size: 9507840 bytes
Date/Time: Tue Jun 5 20:41:56 PDT 2001
Dump status: 0 – Successful

Dedicated Dump Device (1 of 2):
———————————————–
Servers with real memory > 4 GB, will have a dedicated dump device created at installation time.

System Memory Size Dump Device Size
————————————————- ——————————
4 GB to, but not including, 12 GB 1 GB
12, but not including, 24 GB 2 GB
24, but not including, 48 GB 3 GB
48 GB and up 4 GB

hd1 = home file system
hd2 = USR
hd3 = temp
hd4 = root file system
hd5 = boot device
hd6 = paging space, also dump device ***
hd8 = JFS Log
hd9 = /var – variable file system
hd10 = /opt – optional file system

Dedicated Dump Device (2 of 2):
———————————————–
/bosinst.data
.
.
large_dump:
DUMPDEVICE=/dev/lg_dumplv
SIZE_GB = 1
Note: Can specify an alternate dumpdevice with adaquate space

The sysdumpdev Command:
—————————————–
# sysdumpdev -e – Estimate dump size
0453-041 estimated dump size in bytes: 52428800

# sysdumpdev -C – Turn on dump compression

# sysdumpdev -e
0453-041 estimated dump size in bytes: 10485760

Note: Use this information to size the /var file system.

dumpcheck Utility:
—————————-
The dumpcheck utility will do the following when enabled:
– Estimate the dump or compressed dump size using sysdumpdev -e
– Find the dump logical volumes and copy directory using sysdumpdev -l
– Estimate the primary and secondary dump devices sizes
– Estimate the copy directory free space
– Report any errors in the error log file

Methods of Starting a Dump:
——————————————
1. Any Terminal Accepting Input
sysdumpstart {-p | -s} -p=primary, -s=secondary
or SMIT
2. If Keyboard Attached
always allow dump = TRUE
Press or
Press
3. No Terminal or Keyboard
always allow dump = TRUE
Press the RESET button once

Start a Dump from a TTY:
————————————-
login: #dump#>1
Add a TTY
REMOTE reboot enable: dump
REMOTE reboot string: #dump#

Generating Dumps with smit:
——————————————
# smit dump

Always ALLOW System Dump: -k option

Dump-related LED Codes:
————————————–
0c0 – Dump completed successfully
0c1 – An I/O error occurred during the dump
0c2 – Dump started by user
0c4 – Dump completed unsuccessfully. Not enough space on dump device. Partial dump available
0c5 – Dump failed to start. Unexpected error occurred when attempting to write to dump device – e.g. tape not loaded
0c6 – Secondary dump started by user
0c8 – Dump diabled. No dump device configured
0c9 – System-initiated panic dump started
0cc – Failure writing to primary dump device. Switched over to secondary

Copying System Dump:
———————————-
Dump occurs
rc.boot 2
Is there sufficient space in /var to copy dump to?
Yes – Dump copied to /var/adm/ras
No – Display the copy dump to tape menu (Forced copy flag must be set = TRUE for this to occur)
Boot continues

Automatically Reboot After a Crash:
—————————————————-
# smit chgsys
Change/Show Characteristics of Operating System

Option: Automatically REBOOT system after crash = TRUE/FALSE
Must be set to TRUE if you want the system to automatically reboot after a dump.

Sending a Dump to IBM:
————————————
1. Copy a dump onto tape:
/usr/sbin/snap -a -o /dev/rmt0

2. Label tape with:
– Problem Management Record (PMR) number
– Command used to create tape
– Block size to tape

3. Support Center uses kdb to examine the dump

Note: Can mail or FTP the information to IBM

Use kdb to Analyze a Dump:
——————————————
/var/adm/ras/vmcore.x (Dump file)
/unix (Kernel)

# uncompress /var/adm/ras/vmcore.x.Z
# kdb /var/adm/ras/vmcore.x /unix
>status
>stat
(further sub-commands for analyzing)
>quit

Note: /unix kernel must be the same as on the failing machine

Unit Summary:
————————
1. When a dump occurs kernel and system data are copied to the primary dump device.
2. The system by default has a primary dump device (/dev/hd6) and a secondary device (/dev/sysdumpnull)
3. During reboot the dump is copied to the copy directory (/var/adm/ras)
4. A system dump should be retrieved from the system using the snap command
5. The support center uses the kdb debugger to examine the dump

FAQs:
————
Q:
Why is a sytem dump required?
A:
To analyze the system activity

LAB:
———
sysdumpdev -e (Estimate the dump size)
sysdumpdev -l (List dump devices)

df – Check file system available space on /var
/dev/hd9var
chfs -a size=+1 /var (Increase the size of /var)
lslv hd9var

# smit dump
Start a Dump to the Primary Dump Device
System has already crashed
LED of 0c2 shown very fast
Restart the server

sysdumpdev -L (Display status of the last dump)

bootinfo -r (Check the size of memory)
lsattr -El sys0 -a realmem (Usable physical memory)

uncompress /var/adm/ras/vmcore.0.Z
kdb /var/adm/ras/vmcore.0
(0)> stat
(0)> status
(0)> quit

chfs -a size=+1 /tmp (increase /tmp by 1 partition)
snap -a (Checking space requirement…)
Gathering scanout information…done.
cd /tmp/ibmsupt (files created by snap command to send to IBM)
ls

Leave a Reply

Your email address will not be published. Required fields are marked *

*