Sciagraph: find performance and memory bottlenecks in your production Python jobs

Try it out now on Linux (Python 3.7+):

# Install:
python3 -m venv ./venv
. venv/bin/activate
pip install sciagraph

# Download example code:

# Run without profiling:
python 5_000

# Run with profiling:
python -m sciagraph --trial-mode run 5_000

# View the report in a browser:
firefox sciagraph-result/*/index.html
chromium sciagraph-result/*/index.html

Your Python data processing pipeline is too slow—now what?

You’re running a data processing batch job written in Python, and you need it to be a faster. A whole lot faster.

  • Your users need results, but you can’t deliver on time.
  • Your jobs time out—or run out of memory—and have to be re-run.
  • Your cloud computing bill is in the stratosphere.
  • You’re iterating on the implementation—but your experiments take too long, because your code is too slow.

The first step to speeding up your code: identify your code’s performance bottlenecks. But that’s harder than you’d like.

Production is different than your laptop, from CPUs to memory to disk speed to network latency. Is loading from S3 a problem? Are you swapping? Is your program running with less parallelism when there’s more CPUs? Plus, performance problems with production data won’t necessarily show up when using test data.

How do you get accurate performance diagnostics, and quickly?

The most accurate performance data comes from production

The best place to get an accurate understanding of performance bottlenecks is by observing production. But how do you get that data?

By running all your production jobs with profiling enabled from the start, by default. Most profilers are not designed to run constantly in production, of course. You need a profiler designed to have a low performance overhead, and that’s robust enough to run in production.

Sciagraph: a Python profiler for production data processing jobs

This is where the Sciagraph profiler comes in: it’s a profiler designed to run in production, and profile data processing batch jobs.

  • Designed for long-running batch jobs: unlike web applications, your data processing jobs have a specific lifecycle, from loading data to processing to writing out results. You need a profiler that takes that into account in its reporting.
  • Low overhead: Sciagraph is designed to run quickly, with minimal impact on your job’s performance.
  • Cloud storage for reports: reports can be securely stored in the cloud (using end-to-end encryption) so you can easily access performance reports even if your runtime environment is ephemeral, e.g. a container.

Let’s take a look at some examples of what Sciagraph can tell you.

Here we can see two Python threads fighting over CPython’s Global Interpreter Lock, which prevents more than one Python thread from running at a time; wider and redder frames mean more time taken. Mouseover a frame to get the full info for the frame.

You’ll have an easier time viewing this on a computer, with the window maximized; the output is not designed for phones!

And here we can see where peak memory usage is coming from in a different program; wider and redder means more memory usage. You can click on a frame to get a traceback.

Ready to speed up your code? Try out Sciagraph today