Understanding Sciagraph’s output

Sciagraph gives you both performance profiling and peak memory profiling information. As a reminder, we’re looking at the profiling output for the following program:

import json, sys, time, os

def deepcopy_list(l):
    return [i[:] for i in l]

def main(length):
    start = time.time()
    l = []
    for i in range(length):
        l.append(list(range(i)))
    with open("output.json", "w") as f:
        s = json.dumps(l)
        f.write(s)
        f.flush()
        os.fsync(f.fileno())
    l2 = deepcopy_list(l)
    print("All done, elapsed: {:.2f} seconds".format(time.time() - start))

main(int(sys.argv[1]))

Understanding your job’s performance: the per-thread timeline

In order to make your code faster, you need to understand where it’s spending its time. That’s what the timeline view gives you, showing you where each thread spent time.

If you ran the previous program, the resulting report includes a profiling result that looks like this:


In this visualization, each thread has its own timeline, showing the different callstacks running as time passed from the start of the job to its finish.

  • The wider or redder the frame, the higher percentage of time that function was responsible for. Each line is an additional call in the callstack.
  • The graph is interactive. Mouse over a frame to see what percentage of time was spent there, and the untruncated text.
  • You can see if time was spent on computation or waiting (for disk, network I/O, locks, and so on), based on the bottom of each stack.
    • In this example, most of the computation time was spent in json.dumps(l).
    • Forcing a write to disk with os.fsync(f.fileno()) caused the program to wait as data got written to disk.
  • To optimize your code, focus on the wider and redder frames. These are the frames where most of the time was spent.

The result is sampled, picking up to 500-1000 samples more or less evenly spread out across the job’s execution. That means it might be a little inaccurate, and the elapsed time might not match reality exactly. However, for purposes of finding slow parts of the code, those inaccuracies shouldn’t matter: the longer the elapsed time, the more accurate the data, and the longer running lines of code are the ones we care about.

Having found the source of slowness, you can then go and speed up your code.

Understanding your program’s memory usage

Sciagraph also measures your job’s peak memory, since that is what determines how much RAM you need available. Here’s what the memory profile looks like:

What does this mean? This is a slightly different visualization, a flamegraph: it a tree of callstacks and which ones were most expensive. It’s showing peak memory for the whole process. Specifically, it shows the callstacks responsible for memory allocations at the point in time when memory usage was highest. The wider or redder the frame, the higher percentage of memory that function was responsible for.

In a flamegraph, the order of frames is not meaningful. Unlike the timeline, which showed time moving from left to right, in this visualization left vs. right is arbitrary (at the moment it sorts alphabetically). Even if the order happens to match runtime order, that is accidental.

This particular flamegraph is interactive:

  • Click on a frame to see a zoomed in view of that part of the callstack. You can then click “Reset zoom” in the upper left corner to get back to the main overview.
  • Hover over a frame with your mouse to get additional details.

In this case, most of the memory was allocated doing l.append(list(range(i))). Smaller amounts of peak memory were due to other code paths.

Once you know the source of the memory allocations at the moment of peak memory usage, you can then go and reduce memory usage.

The performance flamegraph

The report also includes a performance flamegraph. Unlike the performance timeline:

  • The flamegraph combines multiple identical callstacks. If for example you code called f(); g(); f(); g() and each call was sufficiently slow, the timeline will show each call separately. The flamegraph will show just the combined time for f() and combined time for g().
  • Similarly, it combines multiple threads into a single visualization.

Most of the time the timeline will likely be more helpful; in this example it definitely doesn’t add much, so we won’t bother looking at it.

Next steps

Read one of the following to get going in production: