Saturday, January 28, 2012

Find Postgres tables which have been locked by over 3 minutes

SELECT
pd.datname AS DATABASE
,pc.relname AS relation
,pl.GRANTED
,pl.mode
,psa.current_query
,now() - psa.query_start AS execution_time
FROM
pg_locks pl
LEFT OUTER JOIN pg_class pc ON pc.oid = pl.relation
LEFT OUTER JOIN pg_database pd ON pd.oid = database
LEFT OUTER JOIN pg_stat_activity psa ON psa.procpid = pl.pid
INNER JOIN pg_tables pt ON pt.tablename = pc.relname
WHERE
1 = 1
AND pd.datname = 'cso'
AND now() - psa.query_start >= '00:03:00'
ORDER BY
execution_time DESC;

Friday, January 27, 2012

Cacti IO iostat monitoring - Fixing Invalid field error

I was setting up Cacti to include I/O stats monitoring (you know, those given by iostat command of sysstats linux package). The guys at http://www.markround.com/archives/48-Linux,-Solaris-and-FreeBSD-iostat-monitoring-with-Cacti.html made a really nice job of creating the required scripts and Cacti templates, so I was hoping I'd just install them and use the results. I thought that it was so easy...

For some reason, using their Cacti templates under RedHat EL 5.7with Cacti 0.8.7i gave the following error, when running the query from inside Cacti in debug mode:

Invalid field ioDescr:ioName:ioIndex

Some Googling around took me back to Markround.com, where one of the user comments was about the same issue. The solution turned out to be editing the iostat.xml, looking for that ... line and removing the ioName portion of it. That is:

ioDescr:ioIndex

That solved the issue.

Wednesday, January 25, 2012

SNMP monitoring with Cacti

Yesterday I installed Cacti to monitor a DBMS server. The box has 56 GB RAM and two Quad Core processors. I set up Cacti as usual, that is (ubuntu boxes, taken from http://www.debuntu.org/how-to-monitor-your-servers-with-snmp-and-cacti).

On the box that you want to monitor (which will run SNMPD, but not Cacti itself):
  • sudo apt-get install snmpd.
  • Configure SNMPD so that it listens on all interfaces. Edit 7etc/default/snmpd, look for a line saying SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -I -smux -p /var/run/snmpd.pid 127.0.0.1' and remove the 127.0.0.1 part. Also, make sure that SNMPDRUN is set to "yes" (no quotes).
  • Set up SNMP security. Edit /etc/snmp/snmpd.conf and leave only a com2sec line like this one:
com2sec readonly authorized_host_or_net communityname
  • Make sure to replace authorized_host_or_net with the host or network IP address of the box running cacti (might be the same one you are going to monitor and are installing SNMP into). Also, replace communityname by some other string of your choice. This string is required to query the SNMP daemon.
  • Restart SNMPD.
On the box running Cacti:
  • Test everything is working OK by running a command like this (the command must be run from an authorized IP as defined on the com2sec entry of snmpd.conf):
snmpwalk -Os -c communityname -v 1 snmp_host system
  • This should return a bunch of results. If instead you get a timeout, check your com2sec line is fine, the communityname given as argument to snmpwalk is the same as the one defined on the com2sec entry, and that no firewall is dropping your traffic between you and the SNMP server.
OK, so far, you have only installed SNMP on the box you want to monitor. Now you want to install cacti, which is your GUI to SNMP and allows you to see long term trends of resource usage. Do this:

On the box running Cacti:
  • sudo apt-get install mysql-server apache2 libapache2-mod-php5 php5-mysql php5-cli php5-snmp cacti
  • It will ask you a few questions about passwords for MySQL root user as well as for Cacti's DB user. Input as desired, but make sure you remember them.
  • Fire up your browser to http://cacti_host/cacti/. You will be presented with a configuration wizard. hit Next. Then select New Install and hit Next. Make sure that all dependencies are FOUND and hit Next/Finish.
At this point, Cacti is Up and Running. you will be redirected to it's login screen. Log in with username "admin" password "admin" (no quotes). It will ask you to change the password. Do it as requested. Then click on the graphs tab, and chose Localhost on the left tree. You will be able to see the graphs for the host running Cacti. They might be empty, or missing at all for a few minutes, that's OK. Give it some 10 minutes and you should start seeing data.

OK, you are ready! You are monitoring the box running Cacti... wait... no. You wanted to monitor a different box, right? Yeah, that's why we did all that SNMP stuff on the other box. OK, read on.
  • Go back to the Console tab. Thenk click on Devices under Management title on the left menu. Then click on Add.
  • Enter the required data: The description can be anything you want. Hostname must be either a resolvable hostname or the IP address of the target box, that is, the box running SNMPD.
  • Under host template, choose ucd/net snmp host.
  • Under SNMP Oprions make sure you choose SNMP Version 1. Under SNMP Community enter your your communityname as set up on snmpd.conf, com2sec entry.
  • You can leave the default values for the rest. Click on Create.
  • Cacti will tell you that the host has been created and present you with an option to Create graphs for the host. Click it. It will show you a list of queries and inputs that can be graphed. you can choose which ones you want. I suggest you choose them all and, if you find out some are useless, you can remove them later.
  • Click on Create. Graphs should be ready, but not visible yet.
  • TO make graphs visible, go to Graph Trees under Management, and click on the default tree. You should see an entry for localhost. You can remove it if you are not interested on monitoring the box running cacti. Or you can leave it. I don't care.
  • Click Add. Change Tree Itel Type to Host. Choose your newly created host. Hit Create.
Now you should be able to go back to Graphs tab. The tree on the left should show you the new host. Click on it. and Graphs should be visible. It might take up to 10 minutes again, while Cacti collects data and creates the graphs.

Troubleshooting

The reason I wrote this post is because on yesterday's installation, things did not go as smooth as I just said. Cache Memory graph was always NaN. So was CPU User % usage most of the time. Everything else was working just fine.

The reason for this is that RRDTool (Cacti's backend for data storage and graph creation) has some limit for collected values. In my case, the server had A LOT of RAM as Cache Memory (some 40 GB), and both RRDTool and Cacti expected this value to be between 0 and 10 GB. As for the CPU, They expected the % to be between 0 and, well 100%, but since the box has 8 cores, CPU usage can go up to 800%.

It took me a while to diagnose and find a way to solve this. Here's the short version for you:
  • First, tell Cacti that the values can be larger. Go to Data Templates under Templates, and choose the data source that is giving you trouble. In my case, it was "ucd/net - CPU Usage - User" and "ucd/net - Memory - Cache". Adjust the Maximum value as required. Do NOT check the box. Hit Save.
  • Easy enough? Yes? Well, no. you need to fix your RRDTool database file. Open a shell on your Cacti box and go to rra/ directory under your Cacti installation: There, locate the .rrd file that corresponds to your host and metric (cpu usage or whatever) and run something like this:
rrdtool tune cronos_application_server_cpu_user_10.rrd --maximum cpu_user:800%

Here, replace cronos_application_server_cpu_user_10 with your RRD file name, and cpu_user with the data series you want to adjust. You can find the RRD file in Cacti, under Devices -> you host -> Data source list -> your broken graph name (for me it was Cronos Application Server - CPU Usage - User). As for the data series name, go to Graph Templates -> your troublesome template (for me it was ucd/net - CPU Usage) and check the text in parenthesis on the "Data source" column.

I hope this helps you.