
On Mon, Jan 11, 2010 at 7:36 PM, Ujjval Karihaloo <ujjval at simplesignal.com> wrote: NMI watchdog panics can be caused by a number of things, primarily kernel deadlock, or machine check exceptions. The log output of a panic you got _should_ be more detailed than just 'NMI watchdog' message. 'NMI watchdog' is just the name of the error detection method that provided the panic message.
Yes, it is 4.x, my bad. Ujjval Karihaloo
Then you might want to gather some details (especially, how to reproduce it) and create a support incident with Redhat, to request technical assistance/a workaround regarding that ancient kernel, or find means to migrate to a newer major release that isn't so ancient. e.g. To gather info, setup RH4 'netdump' and 'netconsole'. Have more detailed logs and a kernel crashfile dumped to another server over syslog and ssh netdump user, in case of another lockup. In early 2.6 kernels such as the 2.6.9 kernels used by RH4, there were some deadlock issues that could cause this. RH 4.x is now in the last phase of RH4's production support life cycle, according to Redhat support policy has just a little more than 2 years left. No new minor releases or new hardware enablement are expected, only certain security errata and mission critical bugfixes. The issue may need to be reported to Redhat, by a customer, before they will look into a bugfix.... -- -J