Performance and memory profiling for Celery tasks with Sciagraph
Celery tasks running too slowly in production, or using too much memory? You can get results faster—but only if you can find the bottlenecks and fix them.
Sciagraph can help: it’s a performance observability service for Python batch jobs, giving you performance and memory profiling report for production tasks. And it comes with Celery integration built in.
In order to use Sciagraph with Celery, you need to:
- Ensure Sciagraph is installed and activated in the project environment.
- Add profiling to your Celery tasks.
- Enable profiling on your Celery workers.
- Read the resulting reports, and use them to find the bottleneck.
Whether you’re having ongoing performance issues, or it’s a new input breaking previous assumptions, with always-on performance observability, you always have access to profiling data when your code is too slow.
1. Installing and setting up Sciagraph
The short version:
- Install Sciagraph in the environment where Celery is running by doing
pip install sciagraph(or adding it to your
- Sign up for a Sciagraph account.
- Set the two access key environment variables provided in the account UI once you’ve signed up:
export SCIAGRAPH_ACCESS_KEY=...your key... export SCIAGRAPH_ACCESS_SECERET=...your secret...
See the documentation on using Sciagraph in production for a more detailed guide.
2. Adding profiling to your Celery tasks
If you have a
tasks.py that looks like this:
from celery import Celery app = Celery("tasks", broker="pyamqp://guest@localhost//") @app.task def generate_report(x, y): # ... do some work ... return x + y
You can add Sciagraph performance report generation to that task by using the
from celery import Celery from sciagraph.integrations.celery import profile app = Celery("tasks", broker="pyamqp://guest@localhost//") @app.task @profile # <-- add decorator def generate_report(x, y): # ... do some work ... return x + y
3. Enabling profiling
Once you’ve made sure Sciagraph is enabled on your tasks, you need to make sure your workers have Sciagraph enabled. Sciagraph supports prefork / process pools, and solo mode.
Prefork / process pools
When using a process pool (“prefork”), you enable Sciagraph by setting the usual
SCIAGRAPH_ACCESS_SECRET environment variables, as well as two additional environment variables.
$ export SCIAGRAPH_ACCESS_KEY="...get real value from your account..." $ export SCIAGRAPH_ACCESS_SECRET="...get real value from your account..." $ export SCIAGRAPH_MODE=celery $ export SCIAGRAPH_CELERY_REPORTS_PATH=/home/app/sciagraph-reports $ celery -A tasks worker --pool prefork
The path passed to
SCIAGRAPH_CELERY_REPORTS_PATH is where reports will be stored, in subdirectories based on the task name and individual tasks’ unique ID.
In the example above, if you have a
generate_artifact task in
tasks.py, you will end up with profiling reports in
You can also use Sciagraph with a worker that just runs one task at a time, “solo” mode.
This is similar to the configuration above, except you use a different
$ export SCIAGRAPH_ACCESS_KEY="...get real value from your account..." $ export SCIAGRAPH_ACCESS_SECRET="...get real value from your account..." $ export SCIAGRAPH_MODE=api $ export SCIAGRAPH_CELERY_REPORTS_PATH=/home/app/sciagraph-reports $ celery -A tasks worker --pool solo
4. Reading the reports
There are two ways to read the reports:
- Download the generated reports from Sciagraph’s cloud storage service.
- Read locally stored copies of the reports.
By default, Sciagraph will upload end-to-end encrypted copies of the reports to its cloud storage server. Instructions on how to download these reports will be output in the worker’s logs. Anyone with access to the logs will be able to download and view the reports from any computer with Python installed.
For example, here’s what the logs might look like:
$ Export SCIAGRAPH_MODE=api $ celery -A tasks worker --pool solo ... [2022-07-19 13:45:04,305: WARNING/MainProcess] Successfully uploaded the Sciagraph profiling report. Job start time: 2022-07-19T17:45:03+00:00 Job ID: celery_tasks.add/e09f0ca3-a930-4462-9879-bf38e19ccea4 The report was stored locally at path /tmp/reports/celery_tasks.add/e09f0ca3-a930-4462-9879-bf38e19ccea4 An encrypted copy of the report was uploaded to the Sciagraph storage server. To download the report, run the following on Linux/Windows/macOS, Python 3.7+. If you're inside a virtualenv: pip install --upgrade sciagraph-report Otherwise: pip install --user --upgrade sciagraph-report Then: python -m sciagraph_report download 907e57c4-23d4-4237-88db-4a5da04a9d65 1/Te9N2ZNqlBREWWtngiu7DN25hyNN/RIvh7QkgmtOEbpWyTVwdn
Follow those instructions, and you can view the report.
Reading locally-stored reports
Sciagraph will also store the reports locally, on the machine running the worker.
Specifically, it will store them in the directory specified by
For example, if
SCIAGRAPH_CELERY_REPORTS_PATH=/tmp/reports, after running the
add() task we’ll see:
$ ls /tmp/reports/ celery_tasks.add $ ls /tmp/reports/celery_tasks.add/ e09f0ca3-a930-4462-9879-bf38e19ccea4 $ ls /tmp/reports/celery_tasks.add/e09f0ca3-a930-4462-9879-bf38e19ccea4/ index.html peak-memory.prof peak-memory-reversed.svg peak-memory.svg performance performance.prof performance-reversed.svg performance.svg
index.html in your browser to see the report.
5. Bonus: Making sure old reports are cleaned up
When Sciagraph is enabled, every task with profiling enabled will write out a report. By default, only the last 1000 reports are kept.
To keep more, set the
SCIAGRAPH_CELERY_MAX_REPORTS environment variable before starting the worker, for example:
$ export SCIAGRAPH_CELERY_MAX_REPORTS=5000 $ export SCIAGRAPH_MODE=celery $ celery -A tasks worker --pool=prefork