Intermittent Core Dumps

validan

19-10-2004 21:24:10

Is there a known issue with core dumps with the latest release. I get intermittent core dumps that I am unable to track down any commonality at all. If you wish I can send you the core file for evaluation.


FreeBSD 5.2 and NetMRG .17 compiled from source.

Runs great, minus the core dumps that I did not have before .17.

Thanks Guys.

balleman

19-10-2004 21:53:23

There are no known issues with core dumps on 0.17. The only exception is a problem on 64-bit platforms where we had some size_type issues, but if this was your problem, it should be consistent not intermittent.

Please post a backtrace if you are able. If not, we'll have to make arrangements for receiving the cores.

Thanks for your help!

-Brady

keb

19-10-2004 23:48:39

[quote053f77134f="balleman"]Please post a backtrace if you are able. If not, we'll have to make arrangements for receiving the cores.[/quote053f77134f]

Just in case you haven't done this before (or in a long time), here's how you can do this. Login to the computer that the error occurred on and run

[code1053f77134f]gdb /path/to/netmrg-gatherer /path/to/core_file[/code1053f77134f]

Once at the gdb debugger prompt (looks like "(gdb)", w/o the quotes), type the command 'bt'. This will produce a backtrace similar to

[code1053f77134f]#0 run_netmrg () at netmrg.cpp:90
#1 0x080706a7 in main (argc=1, argv=0xfeec3ee4) at netmrg.cpp:478[/code1053f77134f]

Sending a core file is usually not enough to pinpoint the problem. A person debugging the cause of the core dump also needs the libraries and software binaries that were used when the core dump occurred. This can be a pain because these types of files aren't always portable, so doing a backtrace on the machine that the error occurred on is the easiest way to gather debugging information.

- Kevin

validan

20-10-2004 09:05:39

Here ya go

[code16bf5ca7bd2]
(gdb) bt
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::Rep::clone (this=0x8142520)
at /usr/include/g++/stl_alloc.h:422
#1 0x809c043 in stripnl (input=0xbfaca6d4) at /usr/include/g++/std/bastring.h:75
#2 0x8090942 in update_rrd (info=0xbfaca768, rrd=0xbfaca7e8) at rrd.cpp:127
#3 0x8090da4 in update_monitor_rrd (info=0xbfacaa70, rrd=0xbfaca9cc) at rrd.cpp:150
#4 0x808100f in process_monitor (info=0xbfacb080, mysql=0xbfacbd8c, rrd=0xbfacb100) at monitors.cpp:333
#5 0x80527c8 in process_sub_device (info=0xbfacb220, mysql=0xbfacbd8c) at devices.cpp:144
#6 0x8053380 in process_sub_devices (info=0xbfacba60, mysql=0xbfacbd8c) at devices.cpp:175
#7 0x805bf64 in process_device (dev_id=30) at devices.cpp:353
#8 0x8082c45 in child (arg=0x811a56c) at netmrg.cpp:67
#9 0x283e0240 in _thread_start () from /usr/lib/libc_r.so.4
#10 0x0 in ?? ()
[/code16bf5ca7bd2]

keb

21-10-2004 01:28:40

[quote313b874927="validan"]Here ya go

[code1313b874927]
(gdb) bt
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::Rep::clone (this=0x8142520)
at /usr/include/g++/stl_alloc.h:422
#1 0x809c043 in stripnl (input=0xbfaca6d4) at /usr/include/g++/std/bastring.h:75
#2 0x8090942 in update_rrd (info=0xbfaca768, rrd=0xbfaca7e8) at rrd.cpp:127
[/code1313b874927][/quote313b874927]

Looks like there is a name conflict with the NetMRG stripnl function in utils.cpp and the stripnl definition in the FreeBSD g++ header files. I've created bug#317 to track this issue. There may be more to this issue than just a name conflict, but that's the first thing that should be fixed.

Searching through Google, I see a lot of other projects define their own stripnl function, so they'd probably have similar issues as well. Did you recently update any libraries/headers or upgrade to a newer FreeBSD version?

- Kevin

validan

21-10-2004 08:39:24

No Updates have been installed on this box as far as I know. I will look through logs and access to see.

validan

09-11-2004 08:18:02

Getting another core dump on FreeBSD, below is the bt.

[code17a9a0cdad6]
Core was generated by `netmrg-gatherer'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/libnetsnmp.so.6...done.
Reading symbols from /usr/lib/libcrypto.so.3...done.
Reading symbols from /usr/local/lib/mysql/libmysqlclient.so.10...done.
Reading symbols from /usr/local/lib/libxml2.so.5...done.
Reading symbols from /usr/lib/libstdc++.so.3...done.
Reading symbols from /usr/lib/libz.so.2...done.
Reading symbols from /usr/lib/libm.so.2...done.
Reading symbols from /usr/lib/libc_r.so.4...done.
Reading symbols from /usr/lib/libcrypt.so.2...done.
Reading symbols from /usr/local/lib/libiconv.so.3...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::Rep::clone (this=0x815de00)
at /usr/include/g++/stl_alloc.h:422
422 *__my_free_list = __result -> _M_free_list_link;
(gdb) bt
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::Rep::clone (this=0x815de00)
at /usr/include/g++/stl_alloc.h:422
#1 0x80a0615 in token_replace (source=@0xbfafd220, token={static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 29,
selfish = false},
dat = 0xbfafd148 " \f\bүү\030\004\b үԯ;\t\b\034ү ү\234ѯ@ү; \025\b= \025\b? \025\b0A(\234ѯ"},
value=0xbfafd144) at /usr/include/g++/std/bastring.h:75
#2 0x809a6c1 in snmp_value (input={static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 29, selfish = false},
dat = 0xbfafd220 "\020\025\b(ү\020\023\b@\f\b0\f\b"}) at snmp.cpp:73
#3 0x809c13b in snmp_get (info=0xbfafd5d0, oidstring=0xbfafd65c) at snmp.cpp:215
#4 0x8080a65 in process_snmp_monitor (info=0xbfafda00, mysql=0xbfafed8c) at monitors.cpp:270
#5 0x8082624 in process_monitor (info=0xbfafe010, mysql=0xbfafed8c, rrd=0xbfafe090) at monitors.cpp:329
#6 0x8052878 in process_sub_device (info=0xbfafe1b0, mysql=0xbfafed8c) at devices.cpp:144
#7 0x8053430 in process_sub_devices (info=0xbfafea60, mysql=0xbfafed8c) at devices.cpp:175
#8 0x805c5e0 in process_device (dev_id=30) at devices.cpp:355
#9 0x8086e25 in child (arg=0x811f56c) at netmrg.cpp:67
#10 0x283e5240 in _thread_start () from /usr/lib/libc_r.so.4
#11 0x0 in ?? ()
(gdb)
[/code17a9a0cdad6]

-JW

balleman

09-11-2004 08:54:04

This looks like another stringsize_type issue. Try pulling a new utils.cpp from CVS or use tonight's CVS snapshot. It should be fixed now.

validan

09-11-2004 09:41:17

Ok Updated it, will let ya know! )


Btw, thanks a TON for this wonderful tool.

validan

09-11-2004 09:45:36

[code1e957045097]
Core was generated by `netmrg-gatherer'.
Program terminated with signal 6, Abort trap.
Reading symbols from /usr/local/lib/libnetsnmp.so.6...done.
Reading symbols from /usr/lib/libcrypto.so.3...done.
Reading symbols from /usr/local/lib/mysql/libmysqlclient.so.10...done.
Reading symbols from /usr/local/lib/libxml2.so.5...done.
Reading symbols from /usr/lib/libstdc++.so.3...done.
Reading symbols from /usr/lib/libz.so.2...done.
Reading symbols from /usr/lib/libm.so.2...done.
Reading symbols from /usr/lib/libc_r.so.4...done.
Reading symbols from /usr/lib/libcrypt.so.2...done.
Reading symbols from /usr/local/lib/libiconv.so.3...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0 0x2840ad4c in kill () from /usr/lib/libc_r.so.4
(gdb) bt
#0 0x2840ad4c in kill () from /usr/lib/libc_r.so.4
#1 0x28459366 in abort () from /usr/lib/libc_r.so.4
#2 0x2839e25f in __default_terminate () from /usr/lib/libstdc++.so.3
#3 0x2839e26d in __terminate () from /usr/lib/libstdc++.so.3
#4 0x2839e54b in __sjthrow () from /usr/lib/libstdc++.so.3
#5 0x28385743 in __out_of_range () from /usr/lib/libstdc++.so.3
#6 0x804cb89 in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfacae70,
pos=4294967295, n1=50, s=0x80cd970 "SNMP Query ({'", n2=0) at /usr/include/g++/std/bastring.cc:156
#7 0x804cf0d in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfacae70,
pos1=4294967295, n1=50, str=@0xbfacad94, pos2=0, n2=4294967295) at /usr/include/g++/std/bastring.cc:131
#8 0x80a05e4 in token_replace (source=@0xbfacae70, token={static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 28,
selfish = false}, dat = 0xbfacad98 ""}, value={static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 28,
selfish = false}, dat = 0xbfacad94 "p\f\b"}) at utils.cpp:69
#9 0x809a6c1 in snmp_value (input={static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 28, selfish = false},
dat = 0xbfacae70 "`\f\bx\020\f\b`\f\b"}) at snmp.cpp:73
#10 0x809c13b in snmp_get (info=0xbfacb188, oidstring=0xbfacb184) at snmp.cpp:215
#11 0x809fe34 in get_snmp_uptime (info=0xbfacba60) at snmp.cpp:379
#12 0x805907b in process_device (dev_id=4) at devices.cpp:249
#13 0x8086e25 in child (arg=0x811f508) at netmrg.cpp:67
#14 0x283e5240 in _thread_start () from /usr/lib/libc_r.so.4
#15 0x0 in ?? ()
[/code1e957045097]

balleman

09-11-2004 10:06:48

Ok, try the next new utils.cpp. Hopefully this will work...

validan

09-11-2004 10:18:48

Ok that seems to have done it. Thanks.

Will let you know of any other issues!

balleman

09-11-2004 10:38:33

Excellent. Thanks for helping us to track that down!

validan

15-11-2004 10:57:27

[code1944753ea14]
Core was generated by `netmrg-gatherer'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/libnetsnmp.so.6...done.
Reading symbols from /usr/lib/libcrypto.so.3...done.
Reading symbols from /usr/local/lib/mysql/libmysqlclient.so.10...done.
Reading symbols from /usr/local/lib/libxml2.so.5...done.
Reading symbols from /usr/lib/libstdc++.so.3...done.
Reading symbols from /usr/lib/libz.so.2...done.
Reading symbols from /usr/lib/libm.so.2...done.
Reading symbols from /usr/lib/libc_r.so.4...done.
Reading symbols from /usr/lib/libcrypt.so.2...done.
Reading symbols from /usr/local/lib/libiconv.so.3...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfadc118, pos=0,
n1=4294967295, s=0x80a95d3 "monitors.last_val, ", n2=19) at /usr/include/g++/stl_alloc.h:422
422 *__my_free_list = __result -> _M_free_list_link;
(gdb) bt
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfadc118, pos=0,
n1=4294967295, s=0x80a95d3 "monitors.last_val, ", n2=19) at /usr/include/g++/stl_alloc.h:422
#1 0x8050657 in process_sub_device (info=0xbfadc1b0, mysql=0xbfadcd8c) at /usr/include/g++/std/bastring.h:218
#2 0x8053430 in process_sub_devices (info=0xbfadca60, mysql=0xbfadcd8c) at devices.cpp:175
#3 0x805c5e0 in process_device (dev_id=22) at devices.cpp:355
#4 0x8086e5d in child (arg=0x811f54c) at netmrg.cpp:67
#5 0x283e5240 in _thread_start () from /usr/lib/libc_r.so.4
#6 0x0 in ?? ()
[/code1944753ea14]


Again, these are infrequent.

balleman

15-11-2004 18:36:05

Hmm... I'm not seeing anything that looks problematic where this backtrace is indicating. I'm slightly concerned that we might be dealing with a rare concurrency issue with the string class. If you get any more backtraces that are different than the one below, please send them along.

validan

23-11-2004 14:39:13

[code1f6104aab64]Core was generated by `netmrg-gatherer'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/libnetsnmp.so.6...done.
Reading symbols from /usr/lib/libcrypto.so.3...done.
Reading symbols from /usr/local/lib/mysql/libmysqlclient.so.10...done.
Reading symbols from /usr/local/lib/libxml2.so.5...done.
Reading symbols from /usr/lib/libstdc++.so.3...done.
Reading symbols from /usr/lib/libz.so.2...done.
Reading symbols from /usr/lib/libm.so.2...done.
Reading symbols from /usr/lib/libc_r.so.4...done.
Reading symbols from /usr/lib/libcrypt.so.2...done.
Reading symbols from /usr/local/lib/libiconv.so.3...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfaa8228, pos=38, n1=0,
s=0x8155130 "snmp#rAf02", n2=10) at /usr/include/g++/stl_alloc.h:422
422 *__my_free_list = __result -> _M_free_list_link;
(gdb) dt
Undefined command: "dt". Try "help".
(gdb) bt
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfaa8228, pos=38, n1=0,
s=0x8155130 "snmp#rAf02", n2=10) at /usr/include/g++/stl_alloc.h:422
#1 0x804cf0d in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfaa8228,
pos1=38, n1=0, str=@0xbfaa863c, pos2=0, n2=4294967295) at /usr/include/g++/std/bastring.cc:131
#2 0x809b723 in snmp_get (info=0xbfaa85d0, oidstring=0xbfaa865c) at /usr/include/g++/std/bastring.h:196
#3 0x8080a9d in process_snmp_monitor (info=0xbfaa8a00, mysql=0xbfaa9d8c) at monitors.cpp:270
#4 0x808265c in process_monitor (info=0xbfaa9010, mysql=0xbfaa9d8c, rrd=0xbfaa9090) at monitors.cpp:329
#5 0x8052878 in process_sub_device (info=0xbfaa91b0, mysql=0xbfaa9d8c) at devices.cpp:144
#6 0x8053430 in process_sub_devices (info=0xbfaa9a60, mysql=0xbfaa9d8c) at devices.cpp:175
#7 0x805c5e0 in process_device (dev_id=30) at devices.cpp:355
#8 0x8086e5d in child (arg=0x811f56c) at netmrg.cpp:67
#9 0x283e5240 in _thread_start () from /usr/lib/libc_r.so.4
#10 0x0 in ?? ()[/code1f6104aab64]

validan

01-12-2004 08:50:45

Another one~!

[code1ef24f3ce2f]Core was generated by `netmrg-gatherer'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/libnetsnmp.so.6...done.
Reading symbols from /usr/lib/libcrypto.so.3...done.
Reading symbols from /usr/local/lib/mysql/libmysqlclient.so.10...done.
Reading symbols from /usr/local/lib/libxml2.so.5...done.
Reading symbols from /usr/lib/libstdc++.so.3...done.
Reading symbols from /usr/lib/libz.so.2...done.
Reading symbols from /usr/lib/libm.so.2...done.
Reading symbols from /usr/lib/libc_r.so.4...done.
Reading symbols from /usr/lib/libcrypt.so.2...done.
Reading symbols from /usr/local/lib/libiconv.so.3...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfab9228, pos=48, n1=0,
s=0x80ac169 "'}, '", n2=5) at /usr/include/g++/stl_alloc.h:422
422 *__my_free_list = __result -> _M_free_list_link;
(gdb) bt
#0 basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xbfab9228, pos=48, n1=0,
s=0x80ac169 "'}, '", n2=5) at /usr/include/g++/stl_alloc.h:422
#1 0x809b866 in snmp_get (info=0xbfab95d0, oidstring=0xbfab965c) at /usr/include/g++/std/bastring.h:198
#2 0x8080a9d in process_snmp_monitor (info=0xbfab9a00, mysql=0xbfabad8c) at monitors.cpp:270
#3 0x808265c in process_monitor (info=0xbfaba010, mysql=0xbfabad8c, rrd=0xbfaba090) at monitors.cpp:329
#4 0x8052878 in process_sub_device (info=0xbfaba1b0, mysql=0xbfabad8c) at devices.cpp:144
#5 0x8053430 in process_sub_devices (info=0xbfabaa60, mysql=0xbfabad8c) at devices.cpp:175
#6 0x805c5e0 in process_device (dev_id=30) at devices.cpp:355
#7 0x8086e5d in child (arg=0x811f56c) at netmrg.cpp:67
#8 0x283e5240 in _thread_start () from /usr/lib/libc_r.so.4[/code1ef24f3ce2f]

balleman

01-12-2004 19:33:14

It really does look like string class problems. Could you give us your glibc, libstdc++, and g++/gcc versions?

validan

02-12-2004 14:54:53

gcc version 2.95.4 20020320 [FreeBSD]

glib-1.2.10_11

Not sure on the libstdc++ it should be the standard one that FreeBSD4.10-stable.

balleman

02-12-2004 18:00:13

I think libstdc++ is often tied to the compiler you're using. Could you try using a 3.x series g++? We have had this solve similar issues in the past.