lockfile under /netmrg/var/log/netmrg


09-08-2004 05:07:00


I'm running Netmrg 0.16 under a Linux box [Fedora Core release 2 (Tettnang) ].
A lock file is created under the /netmrg/var/log/netmrg which prevent netmrg
from running more than 8 hours consequently.

Can you explain me the way this lock file is created so I can check what's goin'on
that server.




09-08-2004 13:23:01

The lockfile is created using fopen() from the run_netmrg() function in netmrg.cpp.

What are the specific circumstances of your problem? The only time the lockfile should prevent NetMRG from running is when either another instance of NetMRG is running, or a previous instance crashed. The former is the result of NetMRG "stepping on itself" which can occur when monitoring lots of stuff on an overloaded box. The latter would be the result of a bug in NetMRG that we need to track down. If you can find any errors in your NetMRG output (such as in the lastlog.err files that the wrapper script can create), they would be helpful.


12-08-2004 04:10:59


First thanx for your response. Here is the last environment when the lock files
has been created

ps -edf |grep netmrg

netmrg 25662 25661 0 Aug11 000000 /netmrg/bin/netmrg_cron.sh
netmrg 25666 25662 0 Aug11 000000 /netmrg/bin/netmrg-gatherer
netmrg 25667 25666 0 Aug11 000000 sh -c /rrdtool/bin/rrdtool - >/dev/null
netmrg 25668 25667 0 Aug11 000000 /rrdtool/bin/rrdtool

ls -ltr /netmrg/var/log/netmrg

-rwxrwxrwx 1 nobody nobody 1 Aug 11 1530 runtime
-rw-r--r-- 1 netmrg netmrg 17 Aug 11 1535 lockfile
-rwxrwxrwx 1 nobody nobody 0 Aug 11 1535 lastrun.err
-rwxrwxrwx 1 nobody nobody 510 Aug 11 1535 lastrun.log

As you can see the lastrun.err file is empty.

The server is currently monitoring 5 boxes and has never been overloaded.



13-08-2004 11:31:33

The permissions in that last directory seem a bit odd; some files are owned by 'nobody', while others are owned by 'netmrg'. Is it possible some cron job is chown-ing the files that this directory is in to nobody at some interval?



17-08-2004 04:03:50


I changed the set of permissions but I still get a lock file after a run
of 8 hours

ls -ltr
total 12
-rwxrwxrwx 1 netmrg netmrg 1 Aug 17 0115 runtime
-rw-r--r-- 1 netmrg netmrg 17 Aug 17 0120 lockfile
-rwxrwxrwx 1 netmrg netmrg 0 Aug 17 0120 lastrun.err
-rwxrwxrwx 1 netmrg netmrg 462 Aug 17 0120 lastrun.log

I still have no clue.