AIX 5L SysAdmin II: (Unit 08) – Error Log and syslogd

AIX 5L SysAdmin II: (Unit 08) – Error Log and syslogd

Unit Overview:
———————-
1. Analyze error log entries
2. Identify and maintain the error log components
3. Provide different error notification methods
4. Log system messages using the syslogd daemon

Error Logging Components:
—————————————-
1. An error occurs in:
a.) application errlog()
b.) kernel moduel errsave()
2. Errors are sent to the file /dev/error (timestamp)
3. error daemon /usr/lib/errdemon (runs every minute to check the /dev/error file for errors that have occurred.
Note: errstop command will stop the error daemon
4. Check for error record template in /var/adm/ras/errtmplt
5. Get device information from CuDv, CuAt, CuVPD
6. Errors are logged in /var/adm/ras/errlog file
errlog, errclear, errlogger – commands
7. Use errpt or smit to format the error log for output
7A. Error notification can be turned on which can also do diagnostics and have them sent to the console.
console, errnotify – commands

Generating an Error Report via smit:
—————————————————–
# smit errpt
or
# errpt – Generate a summary report
# errpt -d H – Summary report of all hardware errors
# errpt -A – Intermediate report
# errpt -a – Detailed report
# errpt -a -d S – Detailed report of all software errors
# errpt -c > /dev/console – Concurrent error logging (“Real-time” error logging)

A Summary Report (errpt)
————————————–
# errpt timestamp mmddhhmmyy
timestamp format = month, date, hour, minute, year
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIP
94537C2E 04300333899 P H tok() Wire Fault

Error Type:
P – Permanent, Performance or Pending
T – Temporary
I – Informational
U – Unknown

Error Class:
H – Hardware
S – Software
O – Operator
U – Undetermined

A Detailed Error Report (errpt -a)
————————————————
Displays a page of information for every error in the error log.

Types of Disk Errors:
——————————-
DISK_ERR1 – P – Failure of physical volume media
Action: Replace device as soon as possible
DISK_ERR2, DISK_ERR3 – P – Device does not respond
Action: Check power supply
DISK_ERR4 – T – Error caused by a bad block or event of a recovered error
SCSI_ERR*, SCSI_ERR10 – P – SCSI Communication Problem
Action: Check cable, SCSI addresses, terminator

P = Permanent hardware error
T = Temporary hardware error

Rule of thumb: Replace disk, if it produces more than one DISK_ERR4 per week

LVM Error Log Entries:
———————————-
1. LVM_BBEPOOL, LVM_BBERELMAX, LVM_HWFAIL – (S, P)
No more bad block relocation
Action: Replace disk as soon as possible

2. LVM_SA_STALEPP – (S, P)
Stale physical partition.
Action: check disk, synchronize data (syncvg)

3. LVM_SA_QUORCLOSE – (H, P)
Quorum lost, volume group closing
Action: Check disk, consider working without quorum
varyonvg -f datavg
chpv -v a /dev/ndisk#
varyonvg vgname

H = Hardware, S = Software, P = Permanent, T = Temporary

Maintaining the Error Log:
————————————–
# smit errdemon – Change/Show Characteristics of the Error Log
# smit errclear – Clean the Error Log
crontab -l (-l = list) – runs the errclear every 90 days or so

Note: Don’t want to clear all of the errors because you need to keep a history of what error are happening on the system.

Note: Use the errlogger command as reminder.

Error Notification Methods:
—————————————
1. ODM-Based: – /etc/objrepos/errnotify
2. Periodic Diagnostics: – Check the error log (hardware errors)
diag
diagela – monitors the performance of the hardware but does take a lot of resources to run
3. Concurrent Error Logging: – errpt -c > /dev/console
4. Self Made Error Notification (Script)
#!/usr/bin/ksh
errpt > /tmp/errlog.1
while true
do
sleep 60 # Let’s sleep on minute
errpt > /tmp/errlog.2

# Compare both files.
# If no difference, let’s sleep again
cmp -s /tmp/errlog.1 /tmp/errlog.2 && continue

# Files are different: Let’s inform the operator:
print “Operator: Check error log” > /dev/console

errpt > /tmp/errlog.1
done

ODM-based Error Notification: errnotify
———————————————————
odmget -q “en_name=sample” \errnotify > file

erronotify:
en_pid = 0
en_name = “sample”
en_persistenceflg = 1
en_label = “”
en_crcid = 0
en_class = “H”
en_type = “PERM”
en_alertflg = “”
en_resource = “”
en_rtype = “”
en_rclass = “disk”
en_method = “errpt -a -l $1 | mail -s ‘Disk Error’ root”

odmchange -o errnotify

syslogd Daemon
————————-
/etc/syslog.conf
daemon.debug – /tmp/syslog.debug
Note: Checks everytime the system is rebooted

syslogd
tmp/syslog.debug
inetd[16634]: A connection requires tn service
inetd[16634]: Child process 17212 has ended

# stopsrc -s inetd
# startsrc -s inetd -a “-d” –> Provide debug information

syslogd Configuration Examples
———————————————–
/etc/syslog.conf

auth.debug /dev/console
– All security messages to the system console
mail.debug /tmp/mail.debug
– Collect all mail messages in /tmp/mail.debug
daemon.debug /tmp/daemon.debug
– Collect all daemon messages in /tmp/daemon.debug
*.debug; mail.none @server
Send all other messages, except mail messages to host server

After changing /etc/syslog.conf
– refresh -s syslogd – forces daemon to reread the file, otherwise it will be reread the during the next reboot

Redirecting syslog Messages to Error Log
————————————————————-
/etc/syslog.conf
*.debug errlog –> Redirect all syslog messages to error log

# errpt

Directing Error Log Messages to syslogd
————————————————————-
errnotify:
en_name = “syslog1”
en_persistenceflg = 1
en_method = “logger Error Log: ‘errpt -l $1 | grep -v TIMESTAMP'”
Direct the last error entry (-l $1) to the syslogd.
Do not show the error log header (grep -v).

errclear – will clear the error log
errlogger – allows user to add entries to error log
*.debug errlog – any facility generating issues of debug or higher will be reported to the errolog
en_method in errnotify – tells the error notification how to let user know what script or action to take

Lab:
——–
vi /etc/syslog.conf

Add following line to the above file:
daemon.debug /tmp/syslog.debug

touch /tmp/syslog.debug (must create zero length file for appending)

refresh -s syslogd
stopsrc -s inetd (Stop inetd Subsystem)
startsrc -s inetd -a “-d” (Start inetd Subsystem)
Note: “-d” activates debugging feature

telnet localhost (telnet to our own machine)
login
exit

stopsrc -s inetd
startsrc -s inetd
pg tmp/syslog.debug
vi /etc/syslog.conf
Add new entry at end of file
*.debug errlog
refresh -s syslogd

try to login with a bogus user id
# errpt | more

vi notify.add
errnotify
en_name=”sample”
en_persistenceflg=0
end_class=”O”
en_method=”errpt -a -l $1 | mail -s ‘ERRLOG’ root”
:wq
odmadd notify.add (add information to the ODM)
errlogger Test-entry in the log (log errors with the errlogger)
errlogger Test2-entry in the log
mail (check mail for message sent for errors)

errdemon – starts the error deamon

Leave a Reply

Your email address will not be published. Required fields are marked *

*