6/23/2011

06-23-11 - Map File Graphviz

What I want :

Something that parses the .map and obj's and creates a graph of the size of the executable. Graph nodes should be the size they take in the exe, and connections should be dependencies.

I have all this code for generating graphviz/dotty because I do it for my allocation grapher, but I don't know a good way to get the dependencies in the exe. Getting the sizes of things in the MAP is relatively easy.

To be clear, what you want to see is something like :

s_log_buf is 1024 bytes
s_log_buf is used by Log() in Log.cpp
Log() is called from X and Y and Z
...
just looking at the .map file is not enough, you want to know why a certain symbol got dragged in. (this happens a lot, for example some CRT function like strcpy suddenly shows up in your map you're like "where the fuck did that come from?")

Basically I need the static call-graph or link tables , a list of the dependencies from each function. The main place I need this is on the SPU, because it's a constant battle to keep your image size minimal to fit in the 256k.

I guess I can get it from "objdump" pretty easily, but that only provides dependency info at the obj level, not at the function level, which is what I really want.

Any better solutions?

5 comments:

jfb said...

The problem you'll find is that compilers, when optimizing, don't respect the boundaries of functions. This information does not meaningfully exist if you use whole-program optimization for example.

What you might do is use -O0 to determine the graph and -O3 to determine the sizes after inlining..

Anyway, gcc or Clang's -ffunction-sections will place each function in its own section in an ELF file. This means from the compiler's perspective that each function could be relocated separately in memory.

The result is that the relocation table for each function shows all functions it needs to call that have not been inlined. This will give you exactly what you want -- either use readelf (careful if capturing its output -- it clips function names to fit a table width) or write an ELF parser. It's not that bad of a format to parse - I did it for a (C#) project recently.

Looking at some test code of mine, -ffunction-sections doesn't appear to affect the code quality much, but that may depend on platform. I'm testing on ARM Thumb 2, and it outputs R_ARM_THM_CALL relocations for the linker (relative jump).

Alternatively, see if libclang has an API for dependencies. :)

ryg said...

"This information does not meaningfully exist if you use whole-program optimization for example."
Now that's just wrong.

The compiler will choose to inline some functions completely, yes, but as long as you have symbolic debug information you can infer the actual call graph directly from the final binary, function-level linking or not. That may not correspond 1:1 to the source code but it's still useful.

It would be much better if linkers actually stored the extents for everything in the MAP-file (or, better in terms of getting information out, PDB for VC++). As it is *some* stuff has both position and size, some is just labels without size information, and things like jump tables for switches, compiler-generated immediate constants and multiple-inheritance this-pointer-adjustment thunks don't appear at all. If the symbol table was complete just the start positions would be fine, but with the holes you sometimes have to guess size based from the start address of the next symbol and that can be off badly.

Function-level linking is definitely the way to go for SPU projects (you want the full monty: "-ffunction-sections -fdata-sections -Wl,--gc-sections"). It shouldn't affect code quality at all; it does increase GNU ld link times substantially (since ld is slow), but if you're linking projects with <120k code+data that's a non-issue :).

jfb said...

Hardly wrong. "It may not correspond to the source code"... yes... that's true. Here's an example from what I'm doing on ARM Thumb 2 with Clang.

It inlines most of my functions. I start with about 30 (4KB of code space :) and end up with about 3.

It identifies functions with no side effects. It moves them out of loops all the way up into main().

It identifies cases where I am using int32s as booleans. It replaces these with bytes. Even if it's 0 and 123, not 0 and 1.

I suggest reading http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html ... believe me, compilers work wonders these days. GCC probably does as well. For me, whole-program cuts the code size to a third of the original.

There's not much of the original structure left, so unless C++ inhibits a lot of optimizations I am getting in C, I expect you'll find it the same -- -ffunction-sections and ELF parsing will get you what you need, whole program optimizes too much to keep it.

By the way, most important take-away from the article linked above: signed int overflow being 'undefined', if you want to bounds check, you've got to cast to unsigned first. I was a bit surprised they went that far, seeing how many platforms are two's complement with rollover these days..

cbloom said...

Come one, I'm sure ryg is well aware of undefined behavior in C. There's no need to get pedantic.

The issue is what you can figure out even after optimization, and how often that information is still useful.

What you really want is to look at the binary size using all the real optimizations that you will use in shipping, including all inlining etc.

In fact that's pretty crucial; one of the things that kills you with SPU ELFs is inlining. One of the things that was getting me is "memcpy" was getting inlined in 4 different places in my code, so I had 4 full copies of that routine (which is rather large on the SPU).

Ideally I'd see that "LZ_CopyMatch" is large and I'd be able to see that it's so big because "memcpy" got inlined into it.

I believe that you can figure this out pretty well. You can compile first without inlining and generate your whole call graph. Then compile again with your real shipping settings to get the actual function sizes.

That way you could see that "LZ_CopyMatch" was quite large, and you could see that "memcpy" was inlined into it. You couldn't actually tell how much of that largeness was due to memcpy, but it's a start.

MSN said...

Assuming you are using COFF .obj files as inputs to the linker, COFF specifies fixups (relocations) per section which usually coincide with functions (at least with MSVC). The external fixups are the ones that you want to use as links.

Combine that with the .map file and you should get a nice dependency graph.

MSN

old rants