Diagnosing a hang

The first thing is to determine how hard hang or crash it is. Levels of hang:

  1. Display is frozen or garbage. Everything else works, including typing commands blindly. If X was not running, starting X may restore the display.
  2. Display is frozen in X, but mouse cursor moves.
  3. Display is frozen, no reaction at all to key presses or mouse. NumLock etc. keyboard lights do not react.
  4. Ssh-connection dies.
  5. Serial terminal connection dies.
  6. Machine does not respond to ping.
  7. SysRq-keys cannot sync disks. Logs are not written to disk.
  8. SysRq-keys (SysRq-reboot) do not work at all.
  9. Netconsole fails to deliver kernel messages
  10. Serial terminal fails to deliver messages.
  11. Firescope fails.
  12. Completely dead: display, keyboard and other input devices, network, serial port, IEEE1394. Have to press reset button.
  13. Reset button does not help, machine will not boot. Have to disconnect power (batteries) for a few minutes.
  14. It's dead, Jim. This is just a rough list of various things that may work or not. A level of hang usually includes all the symptoms above it. Some criteria need another computer or special hardware (serial console), or special software on an external computer (firescope).

How to try and get kernel messages

  • mounting the partition, where system logs are written, with sync option
  • ssh, requires network.
  • netconsole, requires another local computer and a network.
  • serial console, requires a serial port, and either an actual serial console, or preferably another computer with a serial port and a null-modem cable. Usb-serial dongles usually fail on the crashing machine.
  • firescope, requires a special program to read the kernel log buffer via IEEE1394 interface (firewire). Both computers need a IEEE1394 port, and a cable.
  • kdump: on crash, kexec an emergency kernel to save the kernel log buffer. Instructions?