Known limitations and future plans

I hope to fix most of these over time.

General

  • Subprocesses (multiprocessing) are not yet supported.

Performance profiling

  • Non-Python threads don’t include stack traces.
  • Many libraries (NumPy, NumExpr, BLOSC used in Zarr) run threadpools that are often not used by your code, and just clutter things up in the profiling report.

Memory profiling

If you’re using NumPy (directly or indirectly), you see lots of memory allocated by openblas.so, and you’re not using linear algebra functions, you can probably ignore this memory. OpenBLAS allocates memory per thread it starts, and its thread pool is sized based on number of CPU cores, so this problem will be worse with CPUs with many cores. If you don’t use BLAS APIs via NumPy’s linear algebra APIs, that memory won’t actually be used, and therefore shouldn’t really count; just ignore it and focus on optimizing the rest of the allocations.

Other known issues:

  • realloc() and mremap() are not tracked yet. These are APIs to resize existing memory allocations, and they are used less often than other memory allocation APIs; at worst this will result in inaccurate output.
  • New mmap()s and large calloc()s are counted as fully allocated even though they don’t actually use RAM until written to; this is the underlying cause of the OpenBLAS issue mentioned above.
  • Non-Python threads don’t show callstacks.