The run subcommand#

The run subcommand is used to launch a new Python process and track the memory allocations it performs while it runs.

Basic tracking#

The general form of the run subcommand is one of:

memray run [options] file.py [args]
memray run [options] -m module [args]

Like the Python interpreter itself, the run subcommand can take a path to a Python file to run, or the name of a Python module to run if you use the -m flag. While it runs, memory allocations and deallocations throughout the program are tracked. By default they are saved into a file with the following pattern:

memray-<script>.<pid>.bin

where <script> is the name of the executed script and <pid> is the process id it ran with.

A different filename can be provided with the -o or --output argument.

Native tracking#

Overview#

Memray supports tracking native C/C++ functions as well as Python functions. This can be especially useful when profiling libraries that have extension modules (such as numpy or pandas) as this gives a holistic vision of how much memory is allocated by the extension and how much is allocated by Python itself.

For instance, consider the Mandelbrot example from the Example Applications for Memray section with native tracking disabled. Some of the most important allocations happen when operating on NumPy arrays:

_images/mandelbrot_operation_non_native.png

Here, we can see that the allocation happens when doing some math on NumPy arrays but unfortunately this doesn’t inform us of what exact operation is allocating memory or how temporaries are being used. We also don’t know if the memory was allocated by NumPy or by the interpreter itself. By using the native tracking mode with Memray we can get a much richer report:

_images/mandelbrot_operation_native.png

In this native report, we can see all the internal C calls that are underneath. We can see that the memory allocation happens when the NumPy arrays are being added, due to PyNumber_Add appearing in the stack trace. Based on PyNumber_Multiply not appearing in the stack trace, we can conclude that the temporary array created by NumPy is immediately freed (or that it didn’t need to allocate memory in the first place, perhaps because it could reuse some already allocated memory).

Tip

Memray will also include inlined functions and macros when native tracking is enabled.

Caution

Activating native tracking has a moderate impact on performance as every instruction pointer in the call stack needs to be resolved whenever an allocation happens. This effect is more noticeable the more allocations the traced application performs.

Check the section on native symbolification for more information on how to obtain the best reports with native information, and on how to debug problems in reports with native information.

Usage#

To activate native tracking, you need to provide the --native argument when using the run subcommand:

memray run --native example.py

This will add native stack information to the result file, which any reporter will automatically use.

Important

When generating reports for result files that contain native frames, the report needs to be generated on the same machine where the result file was generated. This is because the shared libraries that were loaded by the process need to be inspected by Memray to get the correct symbol names.

When reporters display native information they will normally use a different color for the Python frames than the native frames. This can also be distinguished by looking at the file name in a frame, since Python frames will generally come from source files with a .py extension.

Python allocator tracking#

Memray normally tracks allocation and deallocation requests made to the system allocator, but by default it won’t see individual Python objects being created. That’s because the Python interpreter normally uses its own memory pools for creating most objects, only making calls to the system allocator as needed to grow or shrink its memory pools. Our documentation on python allocators describes this memory pooling in greater detail. This behavior speeds the Python interpreter up, and by extension speeds up profiling with Memray, while still allowing Memray to show you each place where your program needs to acquire more memory.

You can ask Memray to show you each individual object being created and destroyed, instead, by providing the --trace-python-allocators argument to the run subcommand. This records a lot more data and makes profiling much slower. It will show you all allocations, even ones that don’t result in your program requesting more memory from the system because the interpreter already had memory available for reuse. It can be useful in some cases, though, especially when tracking down memory leaks.

Note

This acts also as an alternative way to run with PYTHONMALLOC=malloc but in a way that allows distiguishing allocations made by using the system allocator directly and ones made by using the Python allocator.

memray run --trace-python-allocators example.py

Caution

Tracking the Python allocators will result in much larger report files and slower profiling due to the larger amount of data that needs to be collected.

Live tracking#

Overview#

Memray supports presenting a “live” view for observing the memory usage of a running Python program.

_images/live_running.png

Usage#

You can run a program in live mode using run --live:

memray3.9 run --live application.py

Immediately Memray will start your application in the background and will run a TUI in the foreground that you can use to analyze your application’s memory usage. If you don’t want to run your program in the background, you can instead use run --live-remote:

memray3.9 run --live-remote application.py

In this mode, Memray will choose an unused port, bind to it, and display a message saying:

Run 'memray live <port>' in another shell to see live results

It will wait for you to run:

memray3.9 live <port>

in another terminal window to attach to it. Regardless of whether you choose to use one terminal or two, the resulting TUI is exactly the same. See Live Reporting for details on how to interpret and control the TUI.

Tracking across forks#

Overview#

Memray can optionally continue tracking in a child process after a parent process forks. This can be useful when using multiprocessing, or a framework utilizing a pre-fork pattern like Celery or Gunicorn.

Usage#

To activate tracking through forks, you need to provide the --follow-fork argument to the run subcommand:

memray run --follow-fork example.py

In this mode, each time the process forks, a new output file will be created for the new child process, with the new child’s process ID appended to the original capture file’s name. The capture files for child processes are exactly like any other capture file, and can be fed into any reporter of your choosing.

Note

--follow-fork mode can only be used with an output file. It is incompatible with --live mode and --live-remote mode, since the TUI can’t be attached to multiple processes at once.

Aggregated capture files#

If you supply the --aggregate argument to memray run, it will write much smaller capture files. Instead of containing information about every individual allocation performed by the tracked program, a capture file produced using --aggregate will contain some statistics about the process’s allocations aggregated by the location where the allocation happened.

Specifically, for every location where the tracked process performed any allocations, an aggregated capture file includes a count of:

  • How many allocations at that location had not yet been deallocated when the process reached its heap memory high water mark

  • How many bytes had been allocated at that location and not yet deallocated when the process reached its heap memory high water mark

  • How many allocations at that location were leaked (i.e. not deallocated before tracking stopped)

  • How many bytes were leaked by allocations at that location

These counts provide enough information to generate flame graphs. In fact, this information is enough to run most of our reporters, with just a few exceptions:

  • You cannot find temporary allocations using this capture file format, since finding temporary allocations requires knowing when each individual allocation was deallocated.

  • You cannot use the stats reporter with this capture file format, because it needs to see each individual allocation’s size.

  • You cannot use --aggregate with live tracking, since the live TUI needs to see each allocation as it happens.

Also, note that if the process is killed before tracking ends (for instance, by the Linux OOM killer), then the process will die before it finishes calculating its statistics, and so no useful information is ever written to the capture file. With the default file format, the capture file is usually still usable even if the process crashes or is killed, but with the aggregated file format it is not.

If you can live with these limitations, then using --aggregate results in much smaller capture files that can be used seamlessly with most reporters.

CLI Reference#

Run the specified application and track memory usage

usage: memray run [-m module | -c cmd | file] [args]

Named Arguments#

-o, --output

Output file name (default: <process_name>.<pid>.bin)

--live

Start a live tracking session and immediately connect a live server

Default: False

--live-remote

Start a live tracking session and wait until a client connects

Default: False

--live-port, -p

Port to use when starting live tracking (default: random free port)

--aggregate

Write aggregated stats to the output file instead of all allocations

Default: False

--native

Track native (C/C++) stack frames as well

Default: False

--follow-fork

Record allocations in child processes forked from the tracked script

Default: False

--trace-python-allocators

Record allocations made by the pymalloc allocator

Default: False

-q, --quiet

Don’t show any tracking-specific output while running

Default: False

-f, --force

If the output file already exists, overwrite it

Default: False

--compress-on-exit

Compress the resulting file using lz4 after tracking completes

Default: True

--no-compress

Do not compress the resulting file using lz4

Default: False

-c

Program passed in as string

Default: False

-m

Run library module as a script (terminates option list)

Default: False

Please submit feedback, ideas, and bug reports by filing a new issue at https://github.com/bloomberg/memray/issues