• Identify speed and memory bottlenecks2
  • in production3
  • by continuously profiling4
  • your Python data processing workflows5.

  1. /ˈskaɪəˌgræf/: historically, an X-ray photograph. Now, a software service that gives you deep visibility into your Python code’s speed and memory usage.
  2. Slow-running jobs waste your time during development, make your users impatient, and increase your compute costs. Speed them up and you’ll iterate faster, have happier users, and stick to your budget—but first you need to identify the cause of the problem. This is where Sciagraph can help you, by pinpointing slow and inefficient code, down to the level of specific lines of Python.
  3. Reproducing production performance problems on your laptop takes time, is often difficult, and is sometimes impossible. You want insight from the most realistic environment possible. That’s why Sciagraph is designed to profile your code in production.
  4. By the time you identify a slow job, it may be too late to enable profiling; unfortunately, you can’t go back in time. Better to have a profiling report immediately available whenever you need one. That’s why Sciagraph is designed to run continuously, enabled by default, profiling every task you run.
  5. Most continuous profiling services are designed for web applications. But a Django website has very different runtime characteristics than data science and scientific computing. You need tools designed for the kind of code that you write and run. Sciagraph is specifically focused on data processing workflows.

Ready to speed up your code? Try out Sciagraph for free!

Identify performance bottlenecks in calculations, data loading, and more

Sciagraph gives you a timeline showing where your threads spent their time: both CPU and waiting for locks, network communication, filesystem reads and writes, and so on.

Note: You’ll have an easier time viewing this on a computer, with the window maximized; the output is not designed for phones!

Above you can see the profiling report for a program that reads in some text, splits it into words, filters out certain words, and writes the result to JSON. This is a typical structure for data processing workflows: load the inputs → process the data → write out the output.

Wider and redder frames means more of the time was spent in the part of the program. Hover your mouse over a frame to see the text; real reports also include zoom functionality.

In this example, you can see that:

  1. Reading the data was fast enough not to show up in the profiling.
  2. Processing the data, in this case filtering the words, was pretty CPU-intensive.
  3. Writing the data to disk was slow, but not because of CPU: it involved a lot of waiting. In this case, it’s because the program was writing to a remote filesystem.

Discover where and why you’re using too much memory

Sciagraph also reports peak memory usage, the high water mark:

You can click on a frame to zoom in on a stack trace; wider and redder frames means more memory usage.

This example shows the memory usage report for the same program. There are three main sources of memory usage, the most significant being parsing the input file into words.

More features

  • Fast setup: For simple Python processes, using Sciagraph may be as easy as setting an environment variable. And with MLFlow and Celery integration built-in, and other framework support planned, Sciagraph is designed to work out of the box with the frameworks you use. Need support for another framework? Send me an email to get it prioritized.
  • Fast and robust enough to run in production: Sciagraph is designed to minimize impact on your job’s performance, so you won’t even notice it’s there until you need it.
  • Cloud storage for reports: No need to spend time figuring out how to store profiling reports: they can be automatically and securely stored in the cloud. That means you can easily access performance reports even if your runtime environment is ephemeral, like a container.
  • Data privacy: Profiling reports never include user data, and when they’re uploaded to Sciagraph’s cloud storage they are encrypted end-to-end, so no one but you can access them.

Ready to speed up your code? Try out Sciagraph for free!