Set up Sciagraph for production use

Sciagraph is designed as always-on profiling: it should be both robust and fast enough that you can enable it on all your production batch jobs. Then, when you find you’re using too many resources or your program is too slow, you can go back and look at the profiling results.

To run Sciagraph in production, you need to ensure four things:

  1. The sciagraph package is installed.
  2. Your program runs with SCIAGRAPH_ACCESS_KEY and SCIAGRAPH_ACCESS_SECRET environment variables set.
  3. Your program is run with Sciagraph enabled.
  4. You are keeping tracking of the resulting profiling reports.

This articles covers the first three requirements; the fourth is covered in a separate howto on keeping track of reports.

Step 1: Making sure sciagraph is installed

The sciagraph package can be installed normally from PyPI. Make sure you’re using a recent version of pip by upgrading it first; you can easily upgrade pip by running inside a virtualenv.

pip install --upgrade pip
pip install sciagraph

Given it’s just a normal PyPI package, you can add sciagraph as just another dependency to your application, by adding it to the relevant dependency list for your application:

  • requirements.txt
  • setup.py
  • Pipfile (if you’re using Pipenv)
  • pyproject.toml (if you’re using Poetry or Flit)
  • environment.yml (if you’re using Conda)

Conda packages are not available yet.

Step 2: Sign up for an account

In order to use Sciagraph in production, you will need an account with the Sciagraph service. This will give you the access key for the next step.

To get an access key, sign up for a Sciagraph account.

Step 3: Making sure API token environment variables are set

In order to validate that you are a paying user of Sciagraph, you need to set two environment variables wherever your program is running: SCIAGRAPH_ACCESS_KEY and SCIAGRAPH_ACCESS_SECRET. As with other secrets, you don’t want to check these in to your source code repository. You need to get these two environment variables from your new account.

In shell scripts you can just set these with an export command:

export SCIAGRAPH_ACCESS_KEY=...
export SCIAGRAPH_ACCESS_SECRET=...

Setting environment variables in containers

Container runtimes typically have a way to set environment variables. For example:

The documentation will eventually have more detailed instructions for getting these and other runtime environments configured appropriately. Please reach out if you need help.

Step 3: Run your program with Sciagraph enabled

At the moment Sciagraph profiles you whole process, from start to finish. In the future additional options will be available for situations like Celery workers that can run multiple jobs, but for now only a single job per process is supported.

Let’s say your program is typically run like this:

$ python yourprogram.py --load=data/ --twiddle=2.718

There are two ways you can your program with Sciagraph.

Option #1: Running your program with python -m sciagraph

Instead of running your program as above, you can run it with python -m sciagraph run:

$ python -m sciagraph run yourprogram.py --load=data/ --twiddle=2.718

This launches a new Python subprocess, and that is what actually runs your code. Any arguments after run are passed to the new Python interpreter. So if your program is typically run like this:

$ python -m yourpackage arg1 arg2

You can run it with Sciagraph like so:

$ python -m sciagraph run -m yourpackage arg1 arg2

Option #2: Enabling Sciagraph using an environment variable

In some cases you can’t use python -m sciagraph. In this situations, you can instead automatically enable profiling on Python processes by setting an environment variable.

$ export SCIAGRAPH_MODE=process
$ python yourprogram.py --load=data/ --twiddle=2.718

The Python program above will be automatically profiled using Sciagraph, because that environment variable is set.

Next steps

Once you have installed Sciagraph, setting the secrets, and enabled profiling, your program will automatically run under Sciagraph profiling. When the batch process exits, it will upload the report in encrypted form, log the information needed to download the report, and store the report to local disk if you want to store reports yourself.