Jan 3, 2012

Finding subtile malloc bugs

Last year I've talked with a couple of engineers that had where hunting done some strange bug reports with memory corruption that upstream developers could not directly reproduce. The bugs included accessing data after a closedir call or buffer overflows. These kind of bugs normally lead to random behaviour at one point in time and are really difficult to debug. In the case of openSUSE Factory, the openSUSE developers could reproduceable trigger these since glibc comes with some helpers.
If  "MALLOC_PERTURB_" is set in the environment, then every malloc call will clear memory after every free - thus destroying its content - and also initialize malloc'ed memory (with the exception of cmalloc).

Ulrich Drepper who implements this said:
"The reason for this exercise is, of course, to find code which uses memory returned by malloc without initializing it and code which uses code after it is freed. valgrind can do this but it's costly to run. The MALLOC_PERTURB_ exchanges the ability to detect problems in 100% of the cases with speed."
In openSUSE, we set MALLOC_PERTURB_ to 69. So, if you find your memory location suddenly is filled with "E"s (ASCII 69 is 'E') or with "\272" (the bitwise inverse), you have found a bug in handling of malloc.


For more information, read Ulrich's blog or Jakub's blog.


This is only enabled for openSUSE Factory but if you're developing software yourself, I advise to set it yourself and use it to find bugs early.