Good tool to measure system uptime/availability?

I’m trying to come up with a way to set a goal of X% availability for my network/server environment, and I want to be able to run a report against my target every month or so.

Is there a good tool or piece of software out there that can measure how available a server has been over the last x days, for example, and maybe even generate a spreadsheet or something that I could use to create a nice looking report/pie chart type deal?

Munin, OpenNMS, Nagios will all do it.

Perfect! Thanks dermot!

Munin is for monitoring. Nagios is for alerting. The difference is that Munin will let you see what was happening 3 weeks ago, while Nagios will wake you up if something breaks. It sounds like you’re after the former, so Munin is my recommendation too. For your first time setting up this sort of tool it’s definitely the sweet spot between simplicity and functionality.

(At this point I’ll note that Munin has alerting functionality but it really sucks, I wouldn’t recommend using it.)

Something else: monitoring availability is good. But it’s important to realise that availability != uptime. Availability measures how much of the time the services your users use are working. If you’re running a mail server, don’t measure the system uptime, measure the % of time that there was something listening on port 25 to the outside world, and on the POP3/IMAP ports for your internal network (if applicable). Now you’re measuring the availability of Mail (incoming) and Mail (outgoing) which the company cares a lot more about than the uptime of ny-prod-mail.internal.company.net.

This all true and it’s why I didn’t go with my original suggestion which was to use SNMP or WMI or whatever to monitor uptime. I haven’t used Munin so I can’t really comment on that but we’ve use Nagios for monitoring and alerting for years and it will do exactly what you’ve identified in your third paragraph. Nagios will give you system availability for your hosts (servers) and the services that run on them. So you can, for example, see that the server has had 100% availability in the past 3 months but the IMAP daemon has just 99.75%.

Depending on your requirements, of course, you could/should run both Munin and Nagios since they both serve different needs with some small overlap (Munin is probably a bit closer in functionality to MRTG, Cricket or Cacti).

Oh, one last thing - regardless of which combination you go for, the likelihood is that you’re going to have a bit of work to do each month or each quarter when it comes to taking the data in Munin or Nagios and presenting it in a report of some sort. Again, not overly familiar with Munin but Nagios is ancient and it shows both in the presentation of its Web UI and the backend that it uses for storing data.