Downtime

Sorry.

Freaking hardware. We now have all new everything for this server - but the hard drive thank god.

Chet

what exactly happened? am curious to know

happened to oesm.org recently (as of a few days ago)…hard drive corruption. bad news. heh (nothing was lost, but you should have seen the rant the sysop made in his apology speech)

I was just guessing it was a DOS attack from one of our “gently touched” members.

(For extra credit, try to construct a sentence using the phrase “gently touched member” that’s acceptable in polite company.)

You may have noticed that sometimes around 4-5pm this server would not be quite as peppy as normal. This was the busy time for the server. So we thought, lets add some RAM so it smooths out. The server is hardly packed and run fines except for these periods - and even during these periods most days you can’t notice it.

Scheduled a RAM installation. Was told it would happen late in the evening, take less than 5 minutes.

Instead they chose to do it in the middle of the day and pooched the PC. So they replaced parts and servers, moving the harddrive around. It was still having isssue.

This was compounded by the fact that they were lightly staffed because of the holidays, so each step was delayed and they would do something, start the reboot and go on to another pc, coming back to check a little later instead of just sitting at that PC like they normally do.

The PC was still not seeing the NIC card, then we remembered that this was a PC we had upgraded the same as another PC, that had an issue with not having the IPs bound to the nic after the upgrade. it was just odd with this PC since we had rebooted it and this issue never surfaced, seems a hard turning off and on did bring up the problem.

So we alerted them of our guess of the issue, and they proceeded to fix it.

We have 10 servers, and this was the longest straight downtime we have had in the over a year of hosting, all but one other server have just had the down time of patches being installed. The other server with issues was replaced within 12 hours of the problem and only had about 2 hours of downtime. So this officially falls under shit happens - and it sucks when it happens without a full staff around.

Chet

I think the clock has been further screwed. It has gone from being 20-30 mins. off to several hours off. For me at least.

Me too.

Yes, we noticed that as well. Out of or checklist of things we follow, we didn’t have time listed - because we have never had a complete box replacement. And since the list was only used once previously, it was pretty hypothetical.

The time will be adjusted later this evening/morning when traffic is the lightest and it will make the least impact.

Thanks

Chet

Whew! I thought I was going to have to quit posting here.