2.2 release

Gabès Jean

Hi :)

After some weeks of tests of the RC1 version, it's time to release the 2.2 ^^

Thanks to everyone that help us to get this version out, by coding or reporting bugs :)

As always, you can have a look at the full changelog but I'll try to explain youthe main new features you can expect in this version :)

Snapshots

The main new feature of this version are the snapshots.

Snapshots are quite simple in fact. I did have lot of case at work where my boss asked me "why there was such a big load at 3AM, this did crash XXX batch". And the problem is that I did have the load average and other performances values, but no other analysis clues.

What did I need on this time? Just a dumb "ps aux" launched at 3AM during the load average warning/critical phase. And that's spanshots are about: they allow you to launch some commands based on some criteria like "go into warning state" and with large ouput (like a full ps output). This output will be exported to the broker that will be able to save it somewhere (file, mongo, ...). Then it's quite easy for an UI to go query it and display it aside your logs when you try to find what did ask so much load :)

A snapshot will ask for a classic command:

define command {
  command_name    dump_processes
  command_line    /usr/bin/ssh -H $HOSTADDRESS$ "ps ax -o user,vsz,rss,pcpu,command"
}

Then you can call it from a service definition, here a load average one :

define service{
   service_description    Load Average
   use                    linux-ssh-service
   register               0
   host_name              linux-ssh
   check_command          check_ssh_linux_load_average

   snapshot_command    dump_processes
   snapshot_enabled    1
   snapshot_criteria   c
   snapshot_period     night
   snapshot_interval   15
}

The parameters are quite simple:

  • snapshot_command: what to launch. Beware: it will be lauched by a reactionner daemon.
  • snapshot_enabled: 0 by default. Must have a command and be 1 to launch the snapshot
  • snapshot_criteria: on which element state to launch the command (w=warning, c=critical, etc etc)
  • snapshot_period: only launch during this period. Mainly the night when admins are not here :)
  • snapshot_interval: do not hummer the host with such commands, so only launch every 15min the snapshot :)

The scheduler will scheduler the snapshot, that will be launched by the reactionner daemon. then the information is grok by the broker nad it can send it to a module. I wrote a module to manage this, feel free to write your own if you don't like mongodb :)

  shinken install snapshot-mongodb

Enable this module as usual:

define module {
   module_name     snapshot-mongodb
   module_type     snapshot_mongodb
   uri             mongodb://localhost/?safe=false
   database        shinken
}

And in your brokers objects:

define broker {
   broker_name     broker-master
   [...]
   modules  webui, snapshot-mongodb
}

It's now up to the UIs or your reporting tools to use these historical data and show them to your users :)

Exporting internal performance data

On key point about Shinken is its speed. But how to see it and monitor it? In the previous version (2.0) there was a first attempt with an HTTP API to get some data about performances. But now it will even easier as the daemons can automatically export theses data to metrology tools :)

To statsd

The first metrology tool to be manage is statsd. It's really easy to setup one. When it's up, all you have to do is to enable it on your shnken.cfg file:

statsd_host=my-statsd-server
statsd_port=8125
statsd_prefix=shinken
statsd_enabled=1

You only need to set it on your shinken.cfg file (so the arbtier daemon). It will be automatically send to all the others daemons :)

To kernel.shinken.io

If you don't want to start a statsd daemon, you can use the kernel.shinken.io service. It's totally free and it can accept metrology data from your daemon and it will show you the last 7 days metrics of your daemons. It's currently in Beta phase and should be in stable phase quite soon :)

To enable it, it's very easy. All you need is to have a shinken.io account and look at your profile page for your api_key and your secret. Then add them to the shinken.cfg file, like for statsd:

api_key=ABCDEFGHIJ
secret=KLMOPQRST

If you need a proxy, just add it too:

http_proxy=http://monproxy:3128

No password, or hsotname, or contact name will be send to this API. Only metrics. And of course the data are crypted before being send ;)

About old webui and livestatus versions

If you are using webui or livestatus module, please update the module too when you are upgrading the shinken framework. I'm updating the shinken.io versions, but to be sure you may need to take the version directly from github for theses two modules:

  • https://github.com/shinken-monitoring/mod-livestatus
  • https://github.com/shinken-monitoring/mod-webui

We did try to not break compatibility with the old moduels version, but for theses two ones it was not possible without droping some bug fixes. So we did break compatibility with previous module versions.

What next?

I already present the new release cycle on my last post. We will focus on the IT elastic parts, like for my part the addition of hosts without restarting the arbiter daemon ^^

I'll send more regulary news about this, and also about my new side project about service discovery named kunai because it will be a perfect match with the hot host insersion feature :)

See you soon and have a good monitoring :)

jean gabès

Posted in: Talk

Tagged as: shinken

Wanna tell me what you think? Get in touch.


comments powered by Disqus
caret shinken