run subcommand is used to launch a new Python process and track the memory allocations it
performs while it runs.
The general form of the
run subcommand is one of:
memray run [options] file.py [args] memray run [options] -m module [args]
Like the Python interpreter itself, the
run subcommand can take a path to a Python file to run,
or the name of a Python module to run if you use the
-m flag. While it runs, memory allocations
and deallocations throughout the program are tracked. By default they are saved into a file with the
<script> is the name of the executed script and
<pid> is the process id it ran with.
A different filename can be provided with the
Memray supports tracking native C/C++ functions as well as Python functions. This can be especially useful
when profiling libraries that have extension modules (such as
pandas) as this
gives a holistic vision of how much memory is allocated by the extension and how much is allocated by Python itself.
For instance, consider the Mandelbrot example from the Example Applications for Memray section with native tracking disabled. Some of the most important allocations happen when operating on NumPy arrays:
Here, we can see that the allocation happens when doing some math on NumPy arrays but unfortunately this doesn’t inform us of what exact operation is allocating memory or how temporaries are being used. We also don’t know if the memory was allocated by NumPy or by the interpreter itself. By using the native tracking mode with Memray we can get a much richer report:
In this native report, we can see all the internal C calls that are underneath. We can see that the memory allocation
happens when the NumPy arrays are being added, due to
PyNumber_Add appearing in the stack trace. Based on
PyNumber_Multiply not appearing in the stack trace, we can conclude that the temporary array created by NumPy is
immediately freed (or that it didn’t need to allocate memory in the first place, perhaps because it could reuse some
already allocated memory).
Memray will also include inlined functions and macros when native tracking is enabled.
Activating native tracking has a moderate impact on performance as every instruction pointer in the call stack needs to be resolved whenever an allocation happens. This effect is more noticeable the more allocations the traced application performs.
Check the section on native symbolification for more information on how to obtain the best reports with native information, and on how to debug problems in reports with native information.
To activate native tracking, you need to provide the
--native argument when using the
memray run --native example.py
This will add native stack information to the result file, which any reporter will automatically use.
When generating reports for result files that contain native frames, the report needs to be generated on the same machine where the result file was generated. This is because the shared libraries that were loaded by the process need to be inspected by Memray to get the correct symbol names.
When reporters display native information they will normally use a different color for the Python frames than the native
frames. This can also be distinguished by looking at the file name in a frame, since Python frames will generally come
from source files with a
Python allocator tracking#
Memray normally tracks allocation and deallocation requests made to the system allocator, but by default it won’t see individual Python objects being created. That’s because the Python interpreter normally uses its own memory pools for creating most objects, only making calls to the system allocator as needed to grow or shrink its memory pools. Our documentation on python allocators describes this memory pooling in greater detail. This behavior speeds the Python interpreter up, and by extension speeds up profiling with Memray, while still allowing Memray to show you each place where your program needs to acquire more memory.
You can ask Memray to show you each individual object being created and
destroyed, instead, by providing the
--trace-python-allocators argument to
run subcommand. This records a lot more data and makes profiling much
slower. It will show you all allocations, even ones that don’t result in your
program requesting more memory from the system because the interpreter already
had memory available for reuse. It can be useful in some cases, though,
especially when tracking down memory leaks.
This acts also as an alternative way to run with
in a way that allows distiguishing allocations made by using the system
allocator directly and ones made by using the Python allocator.
memray run --trace-python-allocators example.py
Tracking the Python allocators will result in much larger report files and slower profiling due to the larger amount of data that needs to be collected.
Memray supports presenting a “live” view for observing the memory usage of a running Python program.
You can run a program in live mode using
memray3.9 run --live application.py
Immediately Memray will start your application in the background and will run a TUI in the foreground that you can use
to analyze your application’s memory usage. If you don’t want to run your program in the background, you can instead
memray3.9 run --live-remote application.py
In this mode it will choose an unused port and bind to it, waiting for you to run:
memray3.9 live $port
in another terminal window to attach to it. Regardless of whether you choose to use one terminal or two, the resulting TUI is exactly the same. See Live Reporting for details on how to interpret and control the TUI.
Tracking across forks#
Memray can optionally continue tracking in a child process after a parent process forks. This can be useful when using
multiprocessing, or a framework utilizing a pre-fork pattern like Celery or Gunicorn.
To activate tracking through forks, you need to provide the
--follow-fork argument to the
memray run --follow-fork example.py
In this mode, each time the process forks, a new output file will be created for the new child process, with the new child’s process ID appended to the original capture file’s name. The capture files for child processes are exactly like any other capture file, and can be fed into any reporter of your choosing.
--follow-fork mode can only be used with an output file. It is incompatible with
--live-remote mode, since the TUI can’t be attached to multiple processes at once.
Aggregated capture files#
If you supply the
--aggregate argument to
memray run, it will write
much smaller capture files. Instead of containing information about every
individual allocation performed by the tracked program, a capture file produced
--aggregate will contain some statistics about the process’s
allocations aggregated by the location where the allocation happened.
Specifically, for every location where the tracked process performed any allocations, an aggregated capture file includes a count of:
How many allocations at that location had not yet been deallocated when the process reached its heap memory high water mark
How many bytes had been allocated at that location and not yet deallocated when the process reached its heap memory high water mark
How many allocations at that location were leaked (i.e. not deallocated before tracking stopped)
How many bytes were leaked by allocations at that location
These counts provide enough information to generate flame graphs. In fact, this information is enough to run most of our reporters, with just a few exceptions:
You cannot find temporary allocations using this capture file format, since finding temporary allocations requires knowing when each individual allocation was deallocated.
You cannot use the stats reporter with this capture file format, because it needs to see each individual allocation’s size.
You cannot use
--aggregatewith live tracking, since the live TUI needs to see each allocation as it happens.
Also, note that if the process is killed before tracking ends (for instance, by the Linux OOM killer), then the process will die before it finishes calculating its statistics, and so no useful information is ever written to the capture file. With the default file format, the capture file is usually still usable even if the process crashes or is killed, but with the aggregated file format it is not.
If you can live with these limitations, then using
--aggregate results in
much smaller capture files that can be used seamlessly with most reporters.
usage: memray run [-m module | -c cmd | file] [args]
- -o, --output
Output file name (default: <process_name>.<pid>.bin)
Start a live tracking session and immediately connect a live server
Start a live tracking session and wait until a client connects
- --live-port, -p
Port to use when starting live tracking (default: random free port)
Write aggregated stats to the output file instead of all allocations
Track native (C/C++) stack frames as well
Record allocations in child processes forked from the tracked script
Record allocations made by the Pymalloc allocator
- -q, --quiet
Don’t show any tracking-specific output while running
- -f, --force
If the output file already exists, overwrite it
Compress the resulting file using lz4 after tracking completes
Do not compress the resulting file using lz4
Program passed in as string
Run library module as a script (terminates option list)
Please submit feedback, ideas, and bug reports by filing a new issue at https://github.com/bloomberg/memray/issues