[MLUG] My CentOS server reboots unexpextedly

Nick Sklav sklav at teksavvy.com
Thu Mar 26 19:43:16 EDT 2009


On Thu, 2009-03-26 at 18:29 +0200, Georgi Stoynev wrote:
> Hi, guys,
> today I went in strange situation with my CentOS 5.1 based server.
> 
> The server reboots by itself:
>         [root at srv1 ~]# last -x | grep down
>         shutdown system down  2.6.18-92.1.6.el Thu Mar 26 12:05 -
>         17:55  (05:49)
>         root     tty1                          Mon Mar 23 10:30 - down
>         (3+01:33)
>         root     tty2                          Mon Mar 23 10:28 - down
>         (3+01:35)
>         shutdown system down  2.6.18-92.1.6.el Sat Jan 24 14:52 -
>         12:03 (60+21:11)
> 
> /var/log/messages:
> Mar 25 15:39:42 srv1 apcupsd[26869]: Power failure.
> Mar 25 15:39:48 srv1 apcupsd[26869]: Power is back. UPS running on
> mains.
> Mar 26 12:03:58 srv1 shutdown[15032]: shutting down for system reboot
> Mar 26 12:03:58 srv1 init: Switching to runlevel: 6
> Mar 26 12:04:45 srv1 kernel: xenbr0: port 5(vif6.0) entering disabled
> state
> Mar 26 12:04:45 srv1 kernel: device vif6.0 left promiscuous mode
> Mar 26 12:04:45 srv1 kernel: xenbr0: port 5(vif6.0) entering disabled
> state
> Mar 26 12:04:45 srv1 auditd[2603]: Audit daemon has no space left on
> logging partition
> Mar 26 12:04:45 srv1 auditd[2603]: Audit daemon is suspending logging
> due to no space left on logging partition.
> Mar 26 12:04:47 srv1 kernel: xenbr0: port 4(vif5.0) entering disabled
> state
> 
> I can't understand how this happen? Also this lines for auditd bothers
> me a lot. There is nothing before in the logs for that daemon. I
> checked also boot.log, audit.log and dmesg. Can't find anything
> suspicious. At that time the switch connected to this server was out
> of service, but still I don't think it has to do anything with this
> reboot. Another weird point is Xen Domain-0. It is supposed to bring
> up at reboot all virtual machines. Instead online was just one and it
> didn't worked properly, so I did xm destroy on it and fsck after
> that. 
> 
> Main question is: why this happen and how I can prevent it from
> happening again?
> 
> Thank you in advance!


Ok i would check a few thinsg that might not be related directly. But
some stuff i have actually seen in the past.

1.) UPS is misbehaving and sending a shutdown event to the server.

2.) Check the capacitors on the system board and pay attention if any
are bulging.

3.) I have seen weak power supplies have this effect.

Also in some of the above cases i have seen shutdown messages in the
logs as if someone initiated the shutdown similar to what you have seen
and this was due to a faulty power supply. Now i cannot explain how when
the system looses power it is able to log the event.



More information about the mlug mailing list