Smokeping+lighttpd+TCPPing on Debian/Ubuntu

Oliver Gorwits — Thu, 11 Aug 2011 12:45:12 +0000

Some notes on getting Smokeping to work on Debian/Ubuntu using the lighttpd web server, and the TCPPing check.

Install the lighttpd package first, as then the subsequentÂ smokeping package installation will notice that it doesn’t require the Apache web server. However, Smokeping doesn’t auto-configure for lighttpd so a couple of commands are necessary:

# lighttpd-enable-mod cgi
# /etc/init.d/lighttpd force-reload
# ln -s /usr/share/smokeping/www /var/www/smokeping

Visiting your web server’s base url should show a lighttpd help page, and visiting the /cgi-bin/smokeping.cgi path should show the Smokeping home page with logo images working.

Install the TCPPing script by downloading from http://www.vdberg.org/~richard/tcpping and saving to somewhere like /usr/local/bin/tcpping (setting execute bit, also). Obviously, use this path in your Smokeping Probe configuration:

+ TCPPing

binary = /usr/local/bin/tcpping
forks = 10
offset = random
# can be overridden in Targets
pings = 5
port = 21

For the TCPPing check, make sure you have the standalone tcptraceroute package installed. You might find an existing /usr/sbin/tcptraceroute command is available, but this is from the traceroute package and won’t work with the TCPPing script.

The Limoncelli Test

Oliver Gorwits — Thu, 28 Jul 2011 13:40:48 +0000

Over at the excellent Everything Sysadmin blog is a simple test which can be applied to your Sysadmin team to assess its productivity and quality of service. It’s quite straightforward – just 32 things a good quality team ought to be doing, with a few identified as must-have items.

Of course I’m not going to say anything about my current workplace but thought it would be interesting to assess my previous team as of October 2010 when I left. I’m incredibly proud of the work we did and both our efficiency and effectiveness in delivering services with limited resources. That’s reflected in the score of (drumroll…) 31 out of 32!

If you have a Sysadmin team, or work in one, why not quickly run through the test for yourself?

A Strategy for Opsview Keywords

Oliver Gorwits — Fri, 20 May 2011 15:04:54 +0000

At my previous employer, and recently at my current one, I’ve been responsible for migration to an Opsview based monitoring system. Opsview is an evolution of Nagios which brings a multitude of benefits. I encourage you to check it out.

Since the 3.11.3 release, keywords have been put front and centre of the system’s administration, so I want to present here what I’ve been working on as a strategy for their configuration. Keywords can support three core parts of the Opsview system:

The Viewport (a traffic-lights status overview)
User access controls (what services/hosts can beÂ acknowledged, etc)
Notifications (what you receive emails about)

Most important…

My first bit of advice is do not ever set the keywords when provisioning a new Host or Service Check. This is because on these screens you can’t see the complete context of keywords, and it’s far too easy to create useless duplication. You should instead associate keywords with hosts and services from the Configuration/Keywords screen.

Naming Convention

Okay, let’s go to that screen now, and talk about our naming convention. Yes, there needs to be one, so that you can look at a keyword in another part of Opsview and have a rough idea what it might be associated with. Here’s the template I use, and some examples:

-[-]

device-ups
server-linux
service-smtpmsa
service-nss-ntpstratum3

Let’s say you have a Linux server running an SMTP message submission service and an NTP Stratum 3 service. I would create one keyword for the underlying operating system (CPU, memory, disk, etc), named “server-linux“. I’d create another for the SMTP service as “service-smtpmsa” and another for the NTP as “service-ntpstratum3“. If your Opsview is shared between a number of teams, it might also be useful to insert the managing team for that service in the name, as I’ve done with NSS, above. The type “device” tends to be reserved for appliances which fulfil one function, so you don’t need to separate out their server/service nature.

With this in place, if the UNIX Systems team manages the server and OS, and another team manages the applications stack on the box, we’ve got keywords for each, allowing easy and fine grained visibility controls. When creating the keywords, you should go into the Objects tab and associate it with the appropriate hosts and service checks. I find this much more straightforward than using the Keywords field on the actual host and service check configuration pages.

Viewport

Let’s look at each of the three cornerstone uses I mentioned above, in turn. First is the Viewport. Well, that’s easy enough to enable for a keyword by toggling the radio button and assigning a sensible description (such as ï»¿ï»¿”Email Message Submission Service” for “service-smtpmsa“). Which users can see which items in their own viewport is configured in the role (Advanced/Roles) associated to that user. I’d clone off one new role per user, and go to the Objects tab, remove all Host Groups or Service Groups and select only some Keywords. Job done – the user now sees those items in their viewport.

Actions

Next up is the ability for a user to acknowledge, or mark as down, an item. In fact it’s done in the same way as the viewport, that is, through a role. That’s because roles contain, on the Access tab, the VIEWPORTACCESS item for viewports and the ACTIONSOME/NOTIFYSOME items for actioning alerts. Because it’s currently only possible for a user to have one role, you cannot easily separate these rights for different keywords – a real pity. But I have no doubt multiple roles will come along, just like multiple notification profiles.

Notifications

Which brings us to the final item. Again I’d create a new notification profile for each user, so that it’s possible to opt them in or out of any service notifications. Using keywords makes things simple – are you just managing the underlying OS? Then you can have notifications about that, and not the application stack. It doesn’t stop you seeing the app stack status in your viewport, though. Because the notification profile is associated with a user, you’ll only be offered keywords that have been authorized in their role, which is a nice touch.

And finally…

In each of these steps the naming convention has really helped, because when looking at keywords the meaning “these hosts” or “this service” will (hopefully) jump out.Â If I were scaling this up, I’d have it all provisioned via the Opsview API from a configuration management or inventory database, and updated nightly. This is another way naming conventions help – they are friendly to automation.

Cats and Code » monitoring