# This guide assumes you know x86_64 assembly If you do not know x86_64 assembly there are many guides online or you could read chapter -2 and chapter 2 of my guide on exploitation. ## Introduction to some terms A disassembler is a program that takes a program's raw machine code and reconstructs what the program will do. A disassembler shows a the flow of the program in assembly, for example a disassembler might show assembly code that jumps based on a condition like this: ![[binja_screenshot.webp]] Static analysis is the process of analyzing a program without executing it and dynamic analysis is analyzing the program when it's running. With a disassembler you mainly just look at the assembly of a program without actually running it, therefore you statically analyze it most of the time. Sometimes when looking at the assembly code of a program, the code may have some anti reverse engineering techniques applied to it that makes it harder to understand. Anti reverse engineering, also called anti RE is a group of techniques designed to make the analysis of a program harder. Obfuscation is a technique used to transform data without losing any of the meaning and deobfuscation is restoring the obfuscated data to it's original state. A packer is a program that takes a program and creates a new program that contains an obfuscated version of that program and code to deobfuscate it and transfer execution to it. This program that is created is called a packed program. The stub is the part of the packed program that deobfuscates the obfuscated program. The instruction of the packed program's code that transfers execution to the deobfuscated program is called the tail jump. The start of the deobfuscated program is called the original entry point or OEP for short. Packers can obfuscate the code, data, imports, etc of the program it has contained which makes statically looking at a packed program very hard. A control flow graph or CFG for short, is all the possible execution paths in a program. Most anti RE techniques aim to make the CFG messy. A basic block is code that has a single start and a single end, basic blocks typically end in either a call, a ret, a syscall, an interrupt or a jump. A dispatcher is a basic block that determines what basic block to execute next. A state variable is used by the dispatcher to decide which basic block executes next. A virtual machine or VM for short is a custom execution environment with it's own state. A VM has it's own instructions that can modify it's state, for example say we have a VM and it has 3 registers and one instruction that adds the value of the first two registers to the third one. Bytecode is a custom instruction format used by the VM to represent logic that is not directly executed by the CPU. The program understands each bytecode instruction and runs it. An opcode is a part of every VM instruction that specifies which VM instruction it is. A handler is a basic block of assembly that implements the logic of each VM instruction. VMs have an interpreter loop which fetches the custom bytecode instructions and transfers execution to the correct handler. A layer is one anti RE technique applied to a program. A protector is system that combines multiple layers to a program. A stage is a phase of execution where a specific layer is active. A trampoline is used to redirect the flow of execution. There are a lot of anti RE techniques as follows. ## Indirect calls and jumps An indirect call/jump is when a call or jump happens on a register or through a pointer. The ``ret`` instruction performs an indirect jump since it pops off and jumps to whatever is on the top of the stack. For example: ``` global _start section .text add_two_numbers: mov rax, rdi add rax, rsi ret _start: add rsp, 8 mov edi, 7 mov esi, 8 mov rax, add_two_numbers call rax mov rax, _start push rax jmp [rsp] ``` You can also do the following: ``` global _start section .text add_two_numbers: mov rax, rdi add rax, rsi ret _start: mov edi, 7 mov esi, 8 mov qword [rel some_global], add_two_numbers call [rel some_global] mov qword [rel some_global], _start jmp [rel some_global] section .data some_global dq 0 ``` With an indirect call/jump the disassembler may be confused on where the next basic block is going to be, which may create an incorrect CFG, for example: ![[binja_screenshot_04.png]] ## Jump instructions with the same target A common anti RE technique is where we have conditional jumps that jump to the same place, for example: ``` global _start section .text _start: mov al, 7 cmp al, 77 jz label jnz label inc al jmp _start label: jmp _start ``` The above technique would make it trickier for a someone looking at disassembler to determine what is happening because the disassembler shows multiple jumps that all go to the same place cluttering the CFG, for example: ![[binja_screenshot_05.png]] ## Breaking into jumps A common anti RE technique is to break up a single basic block into multiple basic blocks that jump to the next basic block, for example: ``` global _start section .text _start: xor al, al xor bl, bl xor cl, cl xor dl, dl label_01: inc al jmp label_02 label_04: inc dl jmp label_05 label_03: inc cl jmp label_04 label_02: inc bl jmp label_03 label_05: jmp _start ``` The result of single basic block being separated into multiple basic blocks will create a mess on the screen that makes it harder to determine what is happening. We can even make it messier via the following code using the previous technique: ``` global _start section .text _start: xor al, al xor bl, bl xor cl, cl xor dl, dl label_01: inc al jz label_02 jnz label_02 label_04: inc dl jz label_05 jnz label_05 label_03: inc cl jz label_04 jnz label_04 label_02: inc bl jz label_03 jnz label_03 label_05: jz _start jnz _start ``` There would be more jumps appearing on the screen and it would be harder to figure out what is happening, for example: ![[binja_screenshot_06.png]] ## Useless code A common anti RE technique is to have code that is useless, for example: ``` global _start section .text _start: mov al, 7 inc al dec al inc al dec al add al, 2 sub al, 2 add al, 2 sub al, 3 add al, 4 dec al dec al dec al jmp _start ``` In the code above al gets set to 7 at \_start and is 7 when \_start gets jumped to. The code in between doesn't really matter. The above technique would make it take more time for the person looking at the disassembler to figure out what is happening because they have to trace what is happening to al. ## Clones A common anti RE technique is to have clones of the same basic block. For example: ``` global _start section .text _start: mov al, 7 cmp al, 7 jz basic_block inc al cmp al, 8 jz basic_block cmp al, 17 jz basic_block jmp _start basic_block: mov al, 77 jmp _start ``` This technique would make the person looking at the disassembler take more time with their task because they have to look at both basic blocks even though they do the same thing, for example: ![[binja_screenshot_07.png]] ## Always/Never jumping to the target A common anti RE technique is where we have a conditional jump that always jumps to the target. An example is as follows: ``` global _start section .text _start: xor eax, eax jz will_always_branch inc al will_always_branch: jmp _start ``` Another common anti RE technique is where there is a basic block but it never executes, for example: ``` global _start section .text _start: mov al, 7 cmp al, 77 jz label jmp _start label: xor ax, 777 jmp _start ``` This makes the program's intent harder to determine because the disassembler will show the true and false branch even though it will never go to one of those locations, for example: ![[binja_screenshot_08.png]] ## Call/Ret abuse The call and ret instructions can be used to confuse disassemblers if used in certain ways because disassemblers don't correctly trace the stack all the time, for example: ``` global _start section .text _start: add rsp, 16 push _start add qword [rsp], 7 inc qword [rsp] sub qword [rsp], 7 dec qword [rsp] call [rsp] ``` Or: ``` global _start section .text _start: push _start add qword [rsp], 1337 inc qword [rsp] sub qword [rsp], 1338 xchg rax, [rsp] mov [rsp], rax ret ``` The stack can be even harder to trace by using ret with a number after it: ``` global _start section .text add_two_numbers: mov rax, rdi add rax, rsi ret 8 add_three_numbers: mov rax, rdi add rax, rsi add rax, rdx ret add_four_numbers: mov rax, rdi add rax, rsi add rax, rdx add rax, rcx ret 16 _start: mov edi, 7 mov esi, 7 mov edx, 7 mov ecx, 7 push _start push 10 push add_three_numbers push 7 push 9 push add_two_numbers push _start push add_four_numbers ret 8 ``` The above code would look like this in a disassembler: ![[binja_screenshot_09.png]] ## Control flow flattening Control flow flattening is an anti RE technique that replaces the natural execution flow of the program with a dispatcher that executes a certain basic block based on a state variable. The executed basic block eventually leads back to the dispatcher, unless exiting, for example: ``` global _start section .text _start: xor al, al dispatcher: test al, al jz do_thing_01 cmp al, 1 jz do_thing_02 cmp al, 3 jz do_thing_03 jmp dispatcher do_thing_03: mov dil, al mov eax, 60 syscall do_thing_01: inc al jmp dispatcher do_thing_02: add al, 2 jmp dispatcher ``` This makes the intent of the program harder to determine because to a disassembler this looks like a single loop with many conditional jumps. The code above would look like this in a disassembler: ![[binja_screenshot_10.png]] ## Opaque predicates An opaque predicate is a conditional check that always has the same result no matter what when the program is running, but is hard to prove when statically analyzing the program, for example: ``` global _start section .text _start: rdrand eax lea rbx, [rax + 1] mul rbx test rax, 1 jz _start mov edi, eax mov eax, 60 syscall ``` Say we have a number x, no matter what x is, x * (x + 1) is always even, the above technique makes the CFG messy by having a useless basic block that never runs. The code above would look like this in a disassembler: ![[binja_screenshot_11.png]] ## Junk code/bytes In the x86_64 architecture since instructions are just bytes of varying lengths and because there are no explicit instruction boundaries, static analysis isn't always going to be 100% accurate about the instructions it shows the the person looking at the disassembler, however most of the time they will be accurate but junk bytes can be inserted between real instructions leading the disassembler to create a false result of what is happening. These junk bytes will be jumped over and never executed but the disassembler will still have an incorrect picture of what is happening, for example: ``` global _start section .text _start: xor eax, eax jmp $+4 db 0xff db 0xeb test eax, eax jmp $+5 db 0xeb db 0xfe db 0x77 jz _start db 0xeb db 0xfe db 0xfe db 0xeb db 0x99 ``` A disassembler may display the wrong result while in reality the code simply just sets eax to 0, checks if it is 0, and since it is 0, sets the instruction pointer to \_start. If there are some bytes in memory that represent instructions and depending on where the instruction pointer points to in those bytes, they may represent different instructions, for example: ``` byte 0: mov rax, 7 byte 7: ret ``` The bytes would look like: ``` 48 c7 c0 07 00 00 00 c3 ``` or ``` byte 0: 48 byte 1: c7 byte 2: c0 byte 3: 07 byte 4: 00 byte 5: 00 byte 6: 00 byte 7: c3 ``` Say the instruction pointer started at byte 1 instead of byte 0, the resulting code would be: ``` mov eax, 7 ret ``` If we started at byte 2, the resulting code would be: ``` rol byte [rdi], 0 add byte [rax], al ret ``` As I said before depending where the instruction pointers to in a group of bytes that are instructions, the bytes may represent different instructions, for example: ``` global _start section .text _start: nop lea rax, [_start] xor rax, 0x777 dec rax rdrand ecx and ecx, 3 lea rbx, [$+0x21 + rcx] xor rbx, 0x777 inc rbx lea rbx, [rbx - 2] inc rbx xor rbx, 0x777 jmp rbx nop nop nop jmp $+4 and r8b, byte [rsp] sub al, 0x24 inc rax xor rax, 0x777 push rax inc qword [rsp] ret ``` The following code would look like this in a disassembler: ![[binja_screenshot_12.png]] As you can see by the code above, the instruction pointer is set in between instructions via ``jmp rbx`` and ``jmp 0x401040``. The ``jmp rbx`` instruction will jump to or between 0x401039 and 0x40103c. The bytes in the range of those addresses, including the start and end address, all correspond to valid instructions, with all but one being a nop, while 0x40103c is the start of a jmp. The ``jmp 0x401040`` jumps two bytes after the ``and r8b, byte [rsp]`` instruction, which means that ``and r8b, byte [rsp]`` wont be getting executed but whatever is two bytes after will be getting executed, assuming it is a valid instruction. The bytes of ``and r8b, byte [rsp]`` are equal to: ``` byte 0: 0x44 byte 1: 0x22 byte 2: 0x04 byte 3: 0x24 ``` So the instruction pointer would execute whatever is two bytes after byte 0 which is: ``` byte 0: 0x04 ``` But that by itself is not a valid instruction however the following bytes together are valid instruction: ``` byte 0: 0x04 byte 1: 0x24 ``` Those bytes are equal to: ``add al, 0x24``. After that instruction is executed, ``sub al, 0x24`` is executed which restores al to what is originally was. ## Jump/call tables A jump/call table is an array of addresses, with each address holding the start of a basic block and is executed indirectly at runtime. For example: ``` global _start section .text func01: mov edi, 7 ret func02: mov edi, 77 ret func03: mov edi, 777 ret func04: mov edi, 7777 ret _start: rdrand eax and eax, 3 call [table + rax * 8] jmp _start section .rdata table dq func01, func02, func03, func04 ``` Or: ``` global _start section .text func01: mov edi, 7 jmp _start func02: mov edi, 77 jmp _start func03: mov edi, 777 jmp _start func04: mov edi, 7777 jmp _start _start: rdrand eax and eax, 3 jmp [table + rax * 8] section .rdata table dq func01, func02, func03, func04 ``` Since the address to call/jump to is selected at runtime, the disassembler cannot determine what basic block will execute next, usually a disassembler will list every possible basic block from the call/jump table that will execute, for example: ![[binja_screenshot_02.png]] ## Computed call/jumps A computed call/jump is when the program creates an address to call or jump to indirectly via math operations at runtime. Since the address is derived at runtime, the disassembler may have a hard time figuring out where the address leads to, for example: ``` global _start section .text _start: nop nop nop nop lea rax, [label + 7] mov cl, 10 loopy_01: xor ax, 7 inc ax dec ax xor ax, 7 dec rax dec rax sub rax, 5 test cl, cl jnz loopy_01 rdrand ebx and ebx, 3 add rax, rbx jmp rax label: nop nop nop nop lea rax, [_start + 7] mov cl, 10 loopy_02: xor ax, 7 inc ax dec ax xor ax, 7 dec rax dec rax sub rax, 5 test cl, cl jnz loopy_02 rdrand ebx and ebx, 3 add rax, rbx jmp rax ``` Like I said before, since the address to call/jump to is selected at runtime, the disassembler cannot determine what basic block will execute next, for example: ![[binja_screenshot_03.png]]