GCC - Reverse1

2022-10-06 2681 words 13 minutes

Contents

This challenge was created by aiglematth as part of the ctf club GCC. This was my first introduction to the Unicorn framework. The challenge has no real difficulty other than understanding the framework but remains very interesting to study. This post was written with many details for begginers.

Setup

When we try to launch the binary we get the following error:

1
2


$ ./reverse1
./reverse1: error while loading shared libraries: libunicorn.so.2: cannot open shared object file: No such file or directory

To solve the problem, install Unicorn and move libunicorn.so.2 to your shared library folder or load it manually when starting the program.

1
2
3
4


$ pip3 install unicorn
$ sudo cp $(find / -iname "libunicorn.so.2") /usr/lib/x86_64-linux-gnu/libunicorn.so.2
$ ./reverse1
Usage: ./reverse1 <input>

To disable ASRL in an unprivileged docker you will have to do it at your host level.

1

$ sudo echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

ASLR protection must be disabled on the host in order to avoid having random addresses for the instructions each time the program is launched

Don’t forget to reactivate it after your analysis by replacing the 0 with a 2

Binary analysis

Main function

The first thing to do once we reach the main function is to rename the variables that you can easily determine.

We now have a slightly more readable code. We can already see that it is necessary to give a parameter to the program (probably a password) to obtain the flag. We notice that two internal functions without given symbol are called followed by the same uc_hook_add function called twice with different parameters.

Searching for uc_hook_add on the Internet we find that this is a function of the project Unicorn which allows emulation of multiple CPU architectures. On Hot Examples there are templates to use Unicorn.

Open function

As seen on the image above we should have a call to the uc_open function. By going to the first sub_13BB function, we find this call. In the function there is a loop which will be used to generate 4 Unicorn Engine.

We rename the variables based on their definition in unicorn.h.

As for parameter values, UC_ARCH is 3, UC_MODE is 4, and the uc_engine is a pointer to a structure that is updated at the function end.

In unicorn.h or in unicorn_const.py, there are enums for the architecture and the mode. We will therefore have a MIPS architecture in 32 bits little endian:

The enums can be copy paste by going into local types (Shift + F1) then press insert and paste. Their is now a new enum inside the enums window.

We can replaced the representation of UC_ARCH in the .data section.

When we retrieved the value of the function parameters, we could see that uc_engine was contained on 8 bytes (64 bits) but UC_ARCH and UC_MODE were on 4 bytes (32 bits) each. Moreover, in the disassembled code, edi and esi are used, not rdi or rsi.

In the function menu uc_hook_add is in purple as well as other starting with uc. These functions can be find inside the import table, since their are in the dynamic symbol section IDA can retrieve their symbols. However, IDA don’t have the signature of these function therefore the return type and the parameters type will not be known. IDA will only try to guess them.

We replace QWORD (64 bits) with DWORD (32 bits) for the first two parameters.

This gives us the following decompiled code:

We can also change the type of UC_ARCH and UC_MODE to __int64 []. Actually, as we ask for 32 bits in input because of the DWORD if there is more bits (for example 64), only the first 32 will be kept. The advantage is that the size of the elements of the table will be identical to uc_engine one (8 bytes). 0x4A is used as offset since our elements as a size of 8 bytes, but for an elements with a size of 1 bytes, we need to multiply by 8 which means 0x4A * 8 = 0x250 here. First, we change type of uc_arch and uc_mode enums to __int64 by changing the width to 8.

Next, we replaced the previous type of UC_ARCH and UC_MODE by uc_arch [].

Now, we get a function that looks like this:

By going 0x250 further we obtain the second uc_engine which corresponds to x86 on 16 bits in little endian with 4 in architecture and 2 in mode:

Repeating the operation for the next two rounds of the loop, gives us ARM in little endian then PPC on 32 bits in big endian.

Mem map and write function

Mem map

The next function use uc_mem_map to allocate memory size for the data that we want to put in the emulated CPUs as well as two calls to uc_mem_write to write the data in the memory.

According to the code in unicorn.h, uc_mem_map, the second parameter is the starting address of the memory region to allocate, the third is the size of this memory region and finally the protections for the last parameter. This means that each Unicorn Engine will have an address that starts at 0, a size of 8KB (0x2000) with a permission that is set to 7.

Still looking in unicorn.h, we find an enum for protections with 7 which corresponds to Read, Write and Execute.

We will create an enum in the menu and add UC_PROT_ALL in it:

Mem write

For the uc_mem_write function, on the first call the address will be 0, the data will be taken from uc_engine + 0x18 because 3 * 8 = 0x18 and the size will depend on each Unicorn Engine (between 0x8 and 0xC). On the second uc_mem_write the data will be at uc_engine + 0x48 with a size of 0x1C (or 0x100 for the ARM CPU). The address will be 0x1000.

To verify that the address will be 0x1000, we look at the content of off_4060, it is indicated that it’s a QWORD which represents the address of _init_proc.

By changing the type in another format we realize that it is in reality 0x0000000000001000:

The function after all the changes:

Hook function

Addition hook

We can continue with the two uc_hook_add functions. These functions allow you to perform an action when a specific event is received.

For the first call, the address of the Unicorn Engine used corresponds to the x86 emulated CPU (0x4020 + 0x250 = 0x4270). The type of hook is defined in the third argument (2 here) which is equivalent to UC_HOOK_INSN. It allows to hook a specific instruction. The next argument is the function we are going to perform when the hook is triggered. The last argument is optional, it corresponds to the instruction that we want to hook in our case it will be syscall.

The function that will be performed when there is a syscall instruction in uc_engine_x86 will be sub_12C9. This function will read a register and add 19 to the value read before writing it back to the same register. In x86_const.py, we learn that 10 correspond to the cl register.

This gives us:

Is equal hook

The second call to uc_hook_add, use the Unicorn Engine PPC (0x4020 + 0x250 * 3 = 0x4710), with UC_HOOK_CODE as the hook type. The zone that will be used to trigger the hook corresponds to the beginning of the first data written.

If the hook is triggered, the sub_1341 function will be used. This will read the values of a register and compare it to 2, if it is equal qword_4970 is set to 1 otherwise -1. The register whose value is taken corresponds to cr0.

Looking at the references to qword_4970, we notice that it is used in the main. It subtract a number from v6 (which is equal to 0x1C). We therefore understand that our password length must be 0x1C and that it will be decremented 1 by 1 if the cr0 register is equal to 2.

According to Microsoft, if bit 2 of a condition register is set to 1, this means that the two numbers are equal following a comparison. As the bits are from the most significant to the least significant, bit 0 will be worth 8, 1 will be worth 4 and so on for the other bits. Consequently, this means that the equality which is checked is used to indicate whether a comparison operation present in the emulated CPU PPC has returned an equality.

The function now looks like this:

The uc_hook_add functions are also more readable:

Check password function

It only remains to analyze the behavior of the sub_1743 function which takes as a parameter each character that is provided to the program 1 by 1 trough the loop as well as the current index. This function is divided into 4 parts which will globally write into registers, execute each emulated CPUs and then read another register. We’re going to have to do some debugging to get data that is written with uc_mem_write rather than doing it with static IDA.

To get the base address, use starti which is equivalent to putting a breakpoint at 0x0 and running the program. Then execute piebase to get the base address inside pwndbg:

1
2
3


pwndbg> starti
pwndbg> piebase
Calculated VA from /workspace/reverse1 = 0x555555554000

We can now set breakpoints on calls to uc_mem_write function, by adding the RVA (Relative Virtual Address) of the call with the base address:

1
2


pwndbg> b *0x555555554000+0x15AD
pwndbg> b *0x555555554000+0x1675

MIPS CPU

Once the breakpoints are positioned, we will launch the binary with a random argument to stop on our first breakpoint.

Inside the mem_write part, the third argument specified was pointer to the data. Here our data is at address 0x555555558038.

After collecting the written data, we disassemble them on shell-storm.org.

The first instruction will load into $t1, the unsigned byte contained in 0x1000 + $t1. While the second instruction will xor $t0 and $t1 before storing the result inside $v0. According to the MIPS documentation, the r8 register corresponds to $t0, r9 to $t1 and r2 to $v0.

By analyzing the first part, it writes into the register r8/$t0 a character of the password as well as its index in r9/$t1 according to the constants of the file mips_const.py. Then, start the engine which will execute the different instructions written before. Finally, the result of the xor which is in r2 will be stored in return_value.

What is loaded into $t1 at the address 0x1000 + index corresponds to the second memory zone where we write. As there is a difference of 0x30 (0x48 - 0x18) between the two uc_mem_write, we can recover the data by adding this difference to the previous address 0x555555558038 + 0x30.

The other possibility is to continue until the next breakpoint.

x86 CPU

To understand how x86 CPU instructions work, we repeat the same actions by retrieving the address where the data is stored:

Disassembled data indicates that the first instruction will move a byte from 0x1000 + eax into cl which will then be added to bl, before a syscall.

In the second part, the value of return_value is written in the register bl and the index in al (al is contained inside eax). Afterward, we start the engine and get the value of cl.

Again the data at 0x1000 corresponds to the second uc_mem_write.

Do not forget the syscall which is hooked by the first uc_hook_add function, this will add 19 to the value of cl.

ARM CPU

We use the same method for this section, after recovering the data from the first uc_mem_write, we decompile the code.

The instruction will put in r1 the value which is steored in the current address plus 8 bytes which is 0x00001000 (corresponding to the address of our data of the second uc_mem_write). Then, we will retrieve the byte pointed by the address of r1 (0x1000) plus r0. The last instruction is not useful, the purpose was just to provide 0x1000 as data.

In the third part, the value of return_value is written to the r0 register. next, the engine is started and put the value of the r2 register in return_value. This indicates that this emulated CPU will simply retrieve the value of the byte pointed by the value of return_value.

This time there will be 0x100 bytes to have all bytes from 0x00 to 0xff referenced.

PPC CPU

As it is big endian after having recovered the data of the first uc_mem_write, we will change the endianness every four bytes because we are in 32 bits.

By decompiling, we can see that the first lbz instruction retrieves the value of 0x1000 + r1 and places it in r1. Then, we have a comparison operation between r0 and r1 which store the result in cr0, if the two operands are equal cr0 will be worth 2 as we saw with the PPC hook.

According to the code it is the value of return_value which will be placed in r0 and the index in r1.

Solving script

After getting all the data from the second uc_mem_write for the different parts. We loop from 0 to the length of the password (0x1C). In this loop we will search for the index in offset_data corresponding to the value of the array enc_flag. Then, we remove the value of add_data[i] from the index. We subtract 19 before doing a modulo 256 to not exceed 0xFF. Finally, we xor with xor_data[i].

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


#!/usr/bin/env python3

# PPC data
enc_flag = [0x94, 0xcf, 0x07, 0xbd, 0x69, 0x91, 0xd8, 0xc1, 0x92, 0xa4, 0x16, 0xcb, 0x09, 0xe4, 0xb5, 0x25, 0x78, 0x03, 0x9a, 0x3a, 0xaa, 0x76, 0x69, 0xdc, 0x27, 0x0b, 0xdb, 0xa7, 0x00]
# ARM data
offset_data = [0xb6, 0x79, 0xc7, 0x98, 0x57, 0x0c, 0xb3, 0x40, 0x8c, 0x18, 0xb0, 0xcc, 0xae, 0xd5, 0x8d, 0xcb, 0x2d, 0x9c, 0xa9, 0xcd, 0xa4, 0x64, 0x78, 0xac, 0x11, 0x7c, 0x62, 0xec, 0x83, 0xe1, 0xff, 0x3f, 0xd4, 0x5c, 0x05, 0x2a, 0x39, 0x44, 0x32, 0x50, 0xb4, 0x80, 0x94, 0xd0, 0xa0, 0xbb, 0x6c, 0xd9, 0x86, 0xeb, 0xad, 0x33, 0xea, 0x03, 0xda, 0x1f, 0xb5, 0x5b, 0x12, 0x4b, 0x70, 0x04, 0x8b, 0xed, 0xee, 0xa2, 0xd7, 0xbe, 0x1e, 0x8f, 0xdc, 0x7f, 0xa3, 0x81, 0x19, 0xb7, 0xf8, 0x14, 0x06, 0xba, 0x1b, 0x41, 0xb1, 0x37, 0x47, 0x53, 0xca, 0xdb, 0x5f, 0xd2, 0xdf, 0xe2, 0x3a, 0x7d, 0x69, 0xc4, 0x61, 0x7b, 0x2c, 0xa1, 0xa7, 0x96, 0xd6, 0x38, 0x0b, 0xbd, 0x7a, 0x52, 0x35, 0x72, 0xd1, 0x00, 0x67, 0x5e, 0xa8, 0x29, 0x84, 0x91, 0x46, 0x3e, 0xcf, 0x9d, 0x3c, 0xfd, 0x76, 0x21, 0x8e, 0x08, 0xf0, 0xb2, 0xfa, 0xc5, 0xe6, 0x65, 0xe7, 0x6a, 0x4e, 0x71, 0x9a, 0xf2, 0x26, 0x27, 0xfb, 0xfe, 0x88, 0x63, 0x5a, 0x59, 0xf4, 0xa5, 0x1d, 0x16, 0x68, 0x6f, 0x66, 0x0d, 0xe9, 0xf1, 0xb9, 0x92, 0xc0, 0x25, 0x09, 0x0f, 0x31, 0xb8, 0x34, 0x6b, 0xf3, 0x6d, 0x95, 0x4c, 0x97, 0x10, 0x85, 0xd3, 0x07, 0x24, 0x9b, 0xbc, 0x74, 0x4a, 0x22, 0xa6, 0x45, 0x4f, 0x15, 0x77, 0x48, 0xfc, 0x51, 0x2e, 0x87, 0x6e, 0x20, 0xd8, 0xce, 0x42, 0xab, 0x93, 0xf5, 0xc9, 0x1a, 0x89, 0x36, 0xc2, 0xc8, 0x75, 0x01, 0xaf, 0xc1, 0x73, 0x9e, 0x2b, 0x55, 0xf6, 0x82, 0xe4, 0x54, 0x60, 0xef, 0x13, 0x56, 0xf7, 0x2f, 0xf9, 0x49, 0x5d, 0xe5, 0x28, 0xe8, 0x23, 0x58, 0x8a, 0x17, 0x3d, 0x02, 0xc3, 0x0e, 0xbf, 0x4d, 0x3b, 0x0a, 0x99, 0x9f, 0x7e, 0x90, 0xde, 0x1c, 0xaa, 0xc6, 0xe3, 0xe0, 0x43, 0xdd, 0x30]
# x86 data
add_data = [0x03, 0xb1, 0x6c, 0x1c, 0xa9, 0xf4, 0xfe, 0xe4, 0xf9, 0xb2, 0xc5, 0x8d, 0xaa, 0x76, 0xc9, 0x83, 0x86, 0x2e, 0x88, 0xe4, 0x72, 0x62, 0x7d, 0x0e, 0x63, 0x21, 0x29, 0x83, 0x0]
# MIPS data
xor_data = [0x53, 0xf7, 0x72, 0x41, 0xd6, 0x1b, 0xed, 0xba, 0xe0, 0x10, 0xcf, 0x1d, 0x8c, 0x23, 0x03, 0x6f, 0x08, 0xab, 0x9f, 0x09, 0x15, 0x6e, 0xbd, 0x4c, 0x65, 0x0b, 0x24, 0xb3, 0x00]
flag = ""

for i in range(len(enc_flag)):
   # PPC32 / ARM
   offset = offset_data.index(enc_flag[i])
   # x86_16
   added = (offset - add_data[i])
   # x86_hook
   hooked_add = (added - 19) % 256
   # MIPS32
   xored = hooked_add ^ xor_data[i]
   flag += chr(xored)
   
print(flag)

Flag

Flag : GCC{tu_as_pris_du_plaisir??}