Deadlock¶
Intro¶
This lesson is meant to familiarize you with PyStack with a classic problem: lock acquisition.
In this exercise, we will intentionally create a lock ordering issue, which is a common way of causing a deadlock, where two or more threads are all waiting for the others to release resources, causing the program to hang indefinitely.
Development Environment Setup¶
Navigate to the PyStack GitHub repo and get a copy of the source code. You can either clone it, or just download the zip, whatever is your preference here.
You will also need a terminal with a reasonably recent version of python3
installed.
Once you have the repo ready, cd
into the docs/tutorials
folder:
cd docs/tutorials/
It is here where we will be running the tests and exercises for the remainder of the tutorial.
Let’s go ahead and setup our virtual environment. For reference, here are the official python3 venv docs. You can also just follow along with the commands below.
python3 -m venv .venv
Once your virtual environment has been created, you can activate it like so:
source .venv/bin/activate
Your terminal prompt will be prefixed with (.venv)
to show that activation was successful.
With our virtual environment ready, we can go ahead and install PyStack:
python3 -m pip install pystack
Keep your virtual environment activated for the rest of the tutorial, and you should be able to run any of the commands in the exercises.
Debugging a running process¶
pystack remote
lets you analyze the status of a running (“remote”) process.
Triggering the deadlock¶
In the docs/tutorials
directory, there is a script called deadlock.py
:
1import os
2import threading
3import time
4
5
6def background(first_lock, second_lock):
7 with first_lock:
8 print(" First lock acquired")
9 time.sleep(1)
10 with second_lock:
11 print(" Second lock acquired")
12
13
14if __name__ == "__main__":
15 print(f"Process ID: {os.getpid()}")
16 lock_a = threading.Lock()
17 lock_b = threading.Lock()
18
19 t1 = threading.Thread(target=background, args=(lock_a, lock_b))
20 t2 = threading.Thread(target=background, args=(lock_b, lock_a))
21
22 print("Starting First Thread")
23 t1.start()
24 print("Starting Second Thread")
25 t2.start()
26
27 t1.join()
28 t2.join()
29
30 print("Finished execution")
Since we navigated to that directory above, we can run the deadlock script with:
python3 deadlock.py &
This script will intentionally deadlock. The &
causes the process to be run in the background,
so that you’re still able to run commands in the current terminal once it has deadlocked. The output
will contain the process ID, so this is the expected output:
Process ID: <PID>
Starting First Thread
First lock acquired
Starting Second Thread
First lock acquired
You could also find the PID with:
ps aux | grep deadlock.py
After the deadlock occurs we can use the pystack
command to analyze the process (replace
<PID>
with the process ID from the previous step):
pystack remote <PID>
If you see Operation not permitted
, you may need to instead run it with:
sudo -E pystack remote <PID>
Understanding the results¶
The expected result is output similar to this:
Traceback for thread 789 (python3) [] (most recent call last):
(Python) File "/<python_stdlib_path>/threading.py", line 966, in _bootstrap
self._bootstrap_inner()
(Python) File "/<python_stdlib_path>/threading.py", line 1009, in _bootstrap_inner
self.run()
(Python) File "/<python_stdlib_path>/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
(Python) File "/<path_to_tutorials>/deadlock.py", line 10, in background
with second_lock:
Traceback for thread 456 (python3) [] (most recent call last):
(Python) File "/<python_stdlib_path>/threading.py", line 966, in _bootstrap
self._bootstrap_inner()
(Python) File "/<python_stdlib_path>/threading.py", line 1009, in _bootstrap_inner
self.run()
(Python) File "/<python_stdlib_path>/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
(Python) File "/<path_to_tutorials>/deadlock.py", line 10, in background
with second_lock:
Traceback for thread 123 (python3) [] (most recent call last):
(Python) File "/<path_to_tutorials>/deadlock.py", line 27, in <module>
t1.join()
(Python) File "/<python_stdlib_path>/threading.py", line 1089, in join
self._wait_for_tstate_lock()
(Python) File "/<python_stdlib_path>/threading.py", line 1109, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
Notice that each section is displaying a running thread, and there are three threads here:
Thread 123 is the original thread that creates the other two, and waits for them
Thread 456 is
t1
Thread 789 is
t2
Each thread has a stack trace:
The thread 789 is trying to acquire
lock_a
but is blocked becauselock_a
is already held by thread 456.The thread 456 is trying to acquire
lock_b
but is blocked becauselock_b
is already held by thread 789.The main thread 123 is calling
join()
ont1
, waiting for it to finish. However,t1
cannot finish because it is stuck waiting fort2
.
We can see that this is a deadlock: every thread is willing to wait forever for some condition that will never happen, due to the improper lock acquisition ordering.
Exploring more features¶
PyStack has some features that can help us diagnose the problem. Using --locals
you can obtain
a simple string representation of the local variables in the different frames as well as the
function arguments.
When you run:
pystack remote <PID> --locals
The expected result is:
Traceback for thread 789 (python3) [] (most recent call last):
(Python) File "/<python_stdlib_path>/threading.py", line 966, in _bootstrap
self._bootstrap_inner()
Arguments:
self: <Thread at 0x7f0c04b96080>
(Python) File "/<python_stdlib_path>/threading.py", line 1009, in _bootstrap_inner
self.run()
Arguments:
self: <Thread at 0x7f0c04b96080>
(Python) File "/<python_stdlib_path>/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
Arguments:
self: <Thread at 0x7f0c04b96080>
(Python) File "/<path_to_tutorials>/deadlock.py", line 10, in background
with second_lock:
Arguments:
second_lock: <_thread.lock at 0x7f0c04b90900>
first_lock: <_thread.lock at 0x7f0c04b90b40>
Traceback for thread 456 (python3) [] (most recent call last):
(Python) File "/<python_stdlib_path>/threading.py", line 966, in _bootstrap
self._bootstrap_inner()
Arguments:
self: <Thread at 0x7f0c04b5bfd0>
(Python) File "/<python_stdlib_path>/threading.py", line 1009, in _bootstrap_inner
self.run()
Arguments:
self: <Thread at 0x7f0c04b5bfd0>
(Python) File "/<python_stdlib_path>/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
Arguments:
self: <Thread at 0x7f0c04b5bfd0>
(Python) File "/<path_to_tutorials>/deadlock.py", line 10, in background
with second_lock:
Arguments:
second_lock: <_thread.lock at 0x7f0c04b90b40>
first_lock: <_thread.lock at 0x7f0c04b90900>
Traceback for thread 123 (python3) [] (most recent call last):
(Python) File "/<path_to_tutorials>/deadlock.py", line 28, in <module>
t1.join()
(Python) File "/<python_stdlib_path>/threading.py", line 1089, in join
self._wait_for_tstate_lock()
Arguments:
timeout: None
self: <Thread at 0x7f0c04b5bfd0>
(Python) File "/<python_stdlib_path>/threading.py", line 1109, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
Arguments:
timeout: -1
block: True
self: <Thread at 0x7f0c04b5bfd0>
Locals:
lock: <_thread.lock at 0x7f0c04bae100>
Observe that we have the same format of result, with one section for each thread. However now, there is now more information: the local variables and function arguments.
In thread 789 and 456 we can identify the ID of each lock.
In the main thread 123 we can verify the arguments of
lock.acquire()
, and see that no timeout was set (timeout: None
) and thatself
refers to the thread object<Thread at 0x7f0c04b5bfd0>
. Moreover, in_wait_for_tstate_lock
we see thattimeout
is-1
, which represents an indefinitewait
, andblock
isTrue
, meaning it will block until the lock is acquired.