Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Got any tips for profiling and tuning C?
2 points by ghotli on July 17, 2010 | hide | past | favorite | 5 comments
I'm a casual C developer. Up until recently it's been kind of a love/hate relationship. I mostly attribute that to being ignorant of the breadth of debugging, tuning, and profiling toolsets that are out there.

We use an open source tile rendering engine that's entirely written in C. My goal is to identify it's data access patterns. This will help me determine which functions need to be optimized, or how to reorder the data on disk to to optimize for those data access patterns.

I'm going to eventually end up reading the whole codebase, but I'm certain that there are best practices for determining this kind of information that I am just unaware of. I'm vaguely familiar with gdb and valgrind, but I feel like I'm only scratching the surface of their capabilities.

What kind of tooling is everyone else using these days? My specific use case is on linux, but I'd appreciate tips across the board.

I'm also interested to see if recompiling with llvm and clang would give me any performance increase. I see there are malloc replacements like tcmalloc and hoard. Does anyone have experience with these?



For profiling: usually as a first pass I turn to PG. :) Seriously though, add '-pg' to your CFLAGS and LDFLAGS, recompile, run, and look at the output in gprof. It's a pretty good way to easily identify bottlenecks. You can pipe the output to dot and graph the call graphs, etc. -- but I've found that less useful than running the code that I'm trying to make more performant and studying the first 20-30 lines of the gprof output.

http://sourceware.org/binutils/docs/gprof/Compiling.html#Com...

There are better alternatives as well. But adding -pg first is just so easy, and usually (I've found for my stuff) is enough...

For code discovery: I've experimented with strace as others mention (and ltrace). And there are awesome things like Fenris in theory:

http://lcamtuf.coredump.cx/fenris/devel.shtml

But I could never really get them to work personally in practice. Though I learned a lot about what good integration at the terminal level could look like by browsing them. At some point, I hacked together vim and gdb integration pretty well for my purposes (or I should say, improved on the clewn project. I'm pretty happy with it). I wonder if others have done similar things. Anyway, I'm curious what others say as well.


AMD CodeAnalyst is free and pretty good if you have an AMD CPU. If you have an Intel processor it still works it just does less.

The two main data access performance tips are:

1) make sure your loops work right to left if you have [10][9][8] iterate over the 8 array first, the 9 array second, the 10 array third(that is the cache optimal ordering).

2) Prefer SoA(structure of Arrays) to AoS(array of structures) Say you have an array of a structure and you need to loop over the array to update one field in the structure you increase cache hits if you make the structure hold arrays of elements instead of an array of structures.


Start with Drepper's "What every programmer should know about memory":

http://people.redhat.com/drepper/cpumemory.pdf


strace might help you, perhaps with the -e trace=file option. Depends on how mapserver is implemented.

It's not clear to me that this is a C-specific problem.


It's not really, all things need to be profiled in high load situations. I'm moreso looking for an overview of the tooling to see if I'm just unaware of a vital tool. Thanks for the strace info, I'll check it out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: