[MLUG] My CentOS server reboots unexpextedly
Nick Sklav
sklav at teksavvy.com
Thu Mar 26 19:43:16 EDT 2009
On Thu, 2009-03-26 at 18:29 +0200, Georgi Stoynev wrote:
> Hi, guys,
> today I went in strange situation with my CentOS 5.1 based server.
>
> The server reboots by itself:
> [root at srv1 ~]# last -x | grep down
> shutdown system down 2.6.18-92.1.6.el Thu Mar 26 12:05 -
> 17:55 (05:49)
> root tty1 Mon Mar 23 10:30 - down
> (3+01:33)
> root tty2 Mon Mar 23 10:28 - down
> (3+01:35)
> shutdown system down 2.6.18-92.1.6.el Sat Jan 24 14:52 -
> 12:03 (60+21:11)
>
> /var/log/messages:
> Mar 25 15:39:42 srv1 apcupsd[26869]: Power failure.
> Mar 25 15:39:48 srv1 apcupsd[26869]: Power is back. UPS running on
> mains.
> Mar 26 12:03:58 srv1 shutdown[15032]: shutting down for system reboot
> Mar 26 12:03:58 srv1 init: Switching to runlevel: 6
> Mar 26 12:04:45 srv1 kernel: xenbr0: port 5(vif6.0) entering disabled
> state
> Mar 26 12:04:45 srv1 kernel: device vif6.0 left promiscuous mode
> Mar 26 12:04:45 srv1 kernel: xenbr0: port 5(vif6.0) entering disabled
> state
> Mar 26 12:04:45 srv1 auditd[2603]: Audit daemon has no space left on
> logging partition
> Mar 26 12:04:45 srv1 auditd[2603]: Audit daemon is suspending logging
> due to no space left on logging partition.
> Mar 26 12:04:47 srv1 kernel: xenbr0: port 4(vif5.0) entering disabled
> state
>
> I can't understand how this happen? Also this lines for auditd bothers
> me a lot. There is nothing before in the logs for that daemon. I
> checked also boot.log, audit.log and dmesg. Can't find anything
> suspicious. At that time the switch connected to this server was out
> of service, but still I don't think it has to do anything with this
> reboot. Another weird point is Xen Domain-0. It is supposed to bring
> up at reboot all virtual machines. Instead online was just one and it
> didn't worked properly, so I did xm destroy on it and fsck after
> that.
>
> Main question is: why this happen and how I can prevent it from
> happening again?
>
> Thank you in advance!
Ok i would check a few thinsg that might not be related directly. But
some stuff i have actually seen in the past.
1.) UPS is misbehaving and sending a shutdown event to the server.
2.) Check the capacitors on the system board and pay attention if any
are bulging.
3.) I have seen weak power supplies have this effect.
Also in some of the above cases i have seen shutdown messages in the
logs as if someone initiated the shutdown similar to what you have seen
and this was due to a faulty power supply. Now i cannot explain how when
the system looses power it is able to log the event.
More information about the mlug
mailing list