Search, browse, and track profiling reports
Sciagraph currently does not have a built-in way to search or browse profiling reports. Either you store the reports yourself, or you rely on Sciagraph’s cloud storage, which case you can download a specific report is by using the logged download and decryption keys.
What if you want to find which jobs were extra slow, so you can look at their profiling reports, or compare a report from last week to the latest one? You still have options.
- Already using OpenTelemetry to record tracing information about your job? You can use Sciagraph’s OpenTelemetry integration to include the download instructions in those traces.
For more options, see below.
When you only have a few jobs a day
If you’re only running a handful of jobs a day, you could for example send information about each job to an internal Slack channel. Then you just have to scroll back through the Slack channel’s history to find a particular job.
Customizing where download instructions go is covered later in this document, but eventually I expect to include some built-in. If you want a specific integration, e.g. Slack, let me know.
When you have many jobs: using a logging/tracing storage and search backend
Once you’re running many jobs, a Slack message or email per job won’t scale. You need some way to store and search job information. Thing is, you probably should also be logging more general information about your jobs to help with debugging and performance analysis. Logging can give you information that profiling can’t, so you want to do both, and ideally tracing-based logging (see below).
For example, you could log:
- The job’s elapsed time.
- The version of the code that was used, and of dependencies and libraries.
- The inputs to the job, or at least metadata that will help you reproduce them.
- Intermediate states of your code, to help debug problems.
- The outputs of the job, or metadata to help retrieve it.
Finally, Sciagraph will also log information about how to download a profiling report.
If you log this information into some sort of searchable backend, you can then find specific jobs based on some criteria (“a job from last week use an old version of the code”), look at the logs, and also look at the download instructions in the same logs.
Tracing
Tracing is an improved form logging that lets you store information as a tree of spans or actions; see this article on profiling vs logging.
OpenTelemetry is a standard API for among other things tracing messages, with a Python library available. Once you’re using OpenTelemetry to record information about your job, you can send it to many different open source tools or commercial backends, like Jaeger, Honeycomb, DataDog, and many others. These backends will typically both store the tracing logs and provide a GUI for searching and browsing, with varying degrees of sophistication.
Once you are storing tracing info into such a backend, you can search through the information to find relevant jobs, for example outlier jobs that are unexpectedly slow.
By using Sciagraph’s OpenTelemetry integration, when you find a particular job, the logs will also tell you how to download the Sciagraph profiling report.
Not using tracing yet?
Setting it up is pretty easy. For example, if you want to use the Honeycomb SaaS to store traces, you can:
- Sign up for an account at https://honeycomb.io; their free tier may well suffice.
- Look at their Python integration guide to see the basics of how it’s integrated. You don’t want to use sampling; that’s only necessary when sending huge numbers of logging info, which is much more common when scaling web applications that have many users sending many requests.
- Follow this tutorial for basic integration with batch jobs, and how to find slow outliers.
- Add two lines of code to also include Sciagraph download instructions in the tracing, by using the OpenTelemetry integration.
This may take as little as 30 minutes, it’s all pretty quick.
At this point you can also start recording more information in your traces to help you debug problems and identify issues (performance or otherwise).
Customizing where the download instructions go
Sciagraph uses the Python logging
library to record the download instructions.
logging
is highly customizable, so you can override where Sciagraph’s download instructions log message goes.
For example, you can write a custom logging.Hander
that sends messages to Slack.
The download instructions are logging to a "sciagraph"
logging.Logger
.
The actual log message include a sciagraph.api.ReportResult
object with details on how to download the report.
Here’s a custom logging.Handler
that logs the report download instructions to a JSON file:
import logging
import json
class SciagraphReportHandler(logging.Handler):
def __init__(self, path):
self._path = path
logging.Handler.__init__(self)
def emit(self, record):
from sciagraph.api import ReportResult
if isinstance(record.msg, ReportResult):
with open(self._path, "w") as f:
json.dump(
{
"download_instructions": record.getMessage(),
"job_id": record.msg.job_id
},
f,
)
else:
print(record.getMessage())
logging.getLogger("sciagraph").addHandler(
SciagraphReportHandler("./result.json")
)
The sciagraph.api.RecordResult
object has the following fields, all of them strings unless noted otherwise:
job_time
: The time the job was started.job_id
: The job ID.download_key
: The first argument topython -m sciagraph_report download
.decryption_key
: The second argument topython -m sciagraph_report download
.peak_memory_kb
: The peak allocated memory for the job, in kibibytes (==1024 bytes), as an integer.