Sorry to ramble off immediately, but these peaks are really getting annoying and kill your whole chart. I know that you can set up a max value for the monitor and then these values are not drawn at all (are they saved?), but actually I do not want to do that because then I lose information. I mean, there are these "known" peaks like counter roll overs on reboot, but the problem with filter out every high value is that you also lose information on the real "emergencies" and abnormalities.
To solve this dilemma i think it would be great if netmrg would just record ALL data and not filter out anything but considers the max value given in the monitor when drawing the graph (like you can do now in the advanced options in graph drawing). This would give a much better overview as then the max values are still drawn but "cut off". It has to be configured on a monitor level (or similar), as these values can differ per char/host combination.
Problem is of course how to draw averages. I would suggest that you take max(realValue,maxValue) when it comes to calculate averages for graphing purposes.
I think that this would really help to make this really cool tool even more cooler ! )
I think what you're looking for is what's commonly referred to as '95th percentile'.
People in the bandwidth industry typically use this to cut off the spikes (the top 5%) of their graph to smooth everything out.
We don't have support for this yet, but it is something we're looking to support.
Yup, that 95% thingie would be really cool! Please send me an email when you have that ) In the meantime I just keep on using the existing product, because I still love it!
hmm... i seem i dont get it... why is it that every time when my 'counter' resets (e.g. the service/host is restarted/rebooted), i get a spike? one would assume that if the type is a 'counter' and it is not an overflow, the averages would reset and be calculated again? or not? or am i missing something? and is there a way of preventing this?
what am i doing wrong???
The reason you get spikes is because of the wrapping of the counter. Take for example, your CPU usage. If you had been at 17382%, and you rebooted (so the counter would be reset to 0), that would make RRD assume that the counter wrapped and you went 2^32 - 17382 = 4294949914% in that last gathering period. That's quite the big spike.
The only way around this is to put sane maximums on your monitors, so if it exceeeds this, you get the 'Unknown' put in the database. For a single processor computer, I would put a max of '105' - for dual processor, '205', etc.
It's much more difficult to do this for things like 100M ethernet since the max is so high; a counter rollover is much more likely (and happens quite often with normal usage) so the spikes you get are almost unavoidable.
Oh, and don't worry about one 'Unknown' value - these will not be shown in your graphs since we have RRDtool's heartbeat value set to two intervals (so it would take two Unknowns to not show any data - it smooths out anything less).
Hopefull that helps a bit.
Ok, thanx alot for the explanation, I think I got it now!
Would be great if RDDTools would have a "counter" which defines to "Cannot roll over but can be reset to 0 (zero) once in a while"
Cheerio and thanx again for all your patience and helpfulness!!!!