How profiling works

Sciagraph uses the LD_PRELOAD mechanism to preload a shared library at process startup. This is why Sciagraph can’t be used as regular library and needs to be started in a special way: it requires setting up the correct environment before Python starts.

It then does two kinds of profiling.

Memory profiling

The shared library intercepts all low-level C memory allocation and deallocation API calls (but see known limitations), and keeps track of (some of) the allocations. For example, instead of a malloc() memory allocation going directly to your operating system, Sciagraph will intercept it, keep note of the allocation, and then call the underlying implementation of malloc().

To keep overhead low, not all memory allocations are tracked; instead, Sciagraph using sampling. In practice, large allocations will always get sampled, and code that repeatedly allocates enough small allocations to add up to something meaningful will also be sampled. So for programs that use sufficient memory, more than a few hundred MB of RAM, the sampling provides sufficient information to optimize memory usage.

Performance profiling

Sciagraph hooks into the CPython interpreter and tracks function calls. It samples all Python threads multiple times a second, and uses the corresponding callstacks to construct a performance profile.

That means Sciagraph is estimating how much wallclock time a function is running. Slow functions will (on average) get sampled more often.