✝️ 1 John 2:15-17 Do not love the world or anything in the world. If anyone loves the world, love for the Father is not in them. For everything in the world—the lust of the flesh, the lust of the eyes, and the pride of life—comes not from the Father but from the world. The world and its desires pass away, but whoever does the will of God lives forever. # This guide assumes you know x86_64 assembly If you do not know x86_64 assembly there are many guides online or you could read chapter -2 and chapter 2 of my guide on exploitation. ## Introduction to some terms A disassembler is a program that takes a program's raw machine code and reconstructs what the program will do. A disassembler shows the flow of the program in assembly, for example a disassembler might show assembly code that jumps based on a condition like this: ![[binja_screenshot.webp]] Static analysis is the process of analyzing a program without executing it and dynamic analysis is analyzing the program when it's running. With a disassembler you mainly just look at the assembly of a program without actually running it, therefore you statically analyze it most of the time. Sometimes when looking at the assembly code of a program, the code may have some anti reverse engineering techniques applied to it that makes it harder to understand. Anti reverse engineering, also called anti RE is a group of techniques designed to make the analysis of a program harder. Obfuscation is a technique used to transform data without losing any of the meaning and deobfuscation is restoring the obfuscated data to it's original state. A packer is a program that takes a program and creates a new program that contains an obfuscated version of that program and code to deobfuscate it and transfer execution to it. This program that is created is called a packed program. The stub is the part of the packed program that deobfuscates the obfuscated program. The instruction of the packed program's code that transfers execution to the deobfuscated program is called the tail jump. The start of the deobfuscated program is called the original entry point or OEP for short. Packers can obfuscate the code, data, imports, etc of the program it has contained which makes statically looking at a packed program very hard. A control flow graph or CFG for short, is all the possible execution paths in a program. Most anti RE techniques aim to make the CFG messy. A basic block is code that has a single start and a single end, basic blocks typically end in either a call, a ret, a syscall, an interrupt or a jump. A dispatcher is a basic block that determines what basic block to execute next. A state variable is used by the dispatcher to decide which basic block executes next. A virtual machine or VM for short is a custom execution environment with it's own state. A VM has it's own instructions that can modify it's state, for example say we have a VM and it has 3 registers and one instruction that adds the value of the first two registers to the third one. Bytecode is a custom instruction format used by the VM to represent logic that is not directly executed by the CPU. The program understands each bytecode instruction and runs it. An opcode is a part of every VM instruction that specifies which VM instruction it is. A handler is a basic block of assembly that implements the logic of each VM instruction. VMs have an interpreter loop which fetches the custom bytecode instructions and transfers execution to the correct handler. A layer is one anti RE technique applied to a program. A loader is a piece of code that treats bytes from disk (now loaded into memory) or memory as executable, maybe modifies them, and jumps to them. A protector is system that combines multiple layers to a program. A stage is a phase of execution where a specific layer is active. A trampoline is used to redirect the flow of execution. There are a lot of anti RE techniques as follows. # Anti Disassembly techniques There are two main ways disassemblers turn raw machine code into instructions, the first way is by starting at a certain offset in the machine code and turning the bytes into instructions, this is called a linear disassembly. Disassemblers also have another way which is called flow disassembly, which involves looking at the raw machine code and turning them into instructions, and then deciding which bytes to turn into instructions next based on how the instruction interacts with the instruction pointer. Each technique has its weaknesses, linear disassembly turns all the bytes into instructions no matter what even if the bytes wont be treated as instructions, and flow disassembly is vulnerable to techniques that make it difficult to find out where the instruction pointer is going to be. Anti disassembly techniques make the disassembler's contents messy, waste the person who is looking at the disassembler's time, or just make the assembly plain inaccurate. The following are some popular techniques: ## Indirect calls and jumps An indirect call/jump is when a call or jump happens on a register or through a pointer. The ``ret`` instruction performs an indirect jump since it pops off and jumps to whatever is on the top of the stack. For example: ``` global _start section .text add_two_numbers: mov rax, rdi add rax, rsi ret _start: add rsp, 8 mov edi, 7 mov esi, 8 mov rax, add_two_numbers call rax mov rax, _start push rax jmp [rsp] ``` You can also do the following: ``` global _start section .text add_two_numbers: mov rax, rdi add rax, rsi ret _start: mov edi, 7 mov esi, 8 mov qword [rel some_global], add_two_numbers call [rel some_global] mov qword [rel some_global], _start jmp [rel some_global] section .data some_global dq 0 ``` With an indirect call/jump the disassembler may be confused on where the next basic block is going to be, which may create an incorrect CFG, for example: ![[binja_screenshot_04.png]] ## Jump instructions with the same target A common anti RE technique is where we have conditional jumps that jump to the same place, for example: ``` global _start section .text _start: mov al, 7 cmp al, 77 jz label jnz label inc al jmp _start label: jmp _start ``` The above technique would make it trickier for a someone looking at disassembler to determine what is happening because the disassembler shows multiple jumps that all go to the same place cluttering the CFG, for example: ![[binja_screenshot_05.png]] ## Breaking into jumps A common anti RE technique is to break up a single basic block into multiple basic blocks that jump to the next basic block, for example: ``` global _start section .text _start: xor al, al xor bl, bl xor cl, cl xor dl, dl label_01: inc al jmp label_02 label_04: inc dl jmp label_05 label_03: inc cl jmp label_04 label_02: inc bl jmp label_03 label_05: jmp _start ``` The result of single basic block being separated into multiple basic blocks will create a mess on the screen that makes it harder to determine what is happening. We can even make it messier via the following code using the previous technique: ``` global _start section .text _start: xor al, al xor bl, bl xor cl, cl xor dl, dl label_01: inc al jz label_02 jnz label_02 label_04: inc dl jz label_05 jnz label_05 label_03: inc cl jz label_04 jnz label_04 label_02: inc bl jz label_03 jnz label_03 label_05: jz _start jnz _start ``` There would be more jumps appearing on the screen and it would be harder to figure out what is happening, for example: ![[binja_screenshot_06.png]] ## Useless code A common anti RE technique is to have code that is useless, for example: ``` global _start section .text _start: mov al, 7 inc al dec al inc al dec al add al, 2 sub al, 2 add al, 2 sub al, 3 add al, 4 dec al dec al dec al jmp _start ``` In the code above al gets set to 7 at \_start and is 7 when \_start gets jumped to. The code in between doesn't really matter. The above technique would make it take more time for the person looking at the disassembler to figure out what is happening because they have to trace what is happening to al. ## Clones A common anti RE technique is to have clones of the same basic block. For example: ``` global _start section .text _start: mov al, 7 cmp al, 7 jz basic_block inc al cmp al, 8 jz basic_block cmp al, 17 jz basic_block jmp _start basic_block: mov al, 77 jmp _start ``` This technique would make the person looking at the disassembler take more time with their task because they have to look at both basic blocks even though they do the same thing, for example: ![[binja_screenshot_07.png]] ## Opaque predicates An opaque predicate is a check that either always results in a jump to the same target or no jump at all, from the standpoint of a person looking at a disassembler it could appear that one or multiple jumps are possible, but in reality either one or no jumps ever occur. An example is as follows: ``` global _start section .text _start: xor eax, eax jz will_always_branch inc al will_always_branch: jmp _start ``` Or: ``` global _start section .text _start: mov al, 7 cmp al, 77 jz label jmp _start label: xor ax, 777 jmp _start ``` Opaque predicates makes the program's intent harder to determine because the disassembler will show the possible jumps even though it will only go to one of those locations, for example: ![[binja_screenshot_08.png]] Another example of an opaque predict could be: ``` global _start section .text _start: rdrand eax lea rbx, [rax + 1] mul rbx test rax, 1 jz _start mov edi, eax mov eax, 60 syscall ``` Say we have a number x, no matter what x is, x * (x + 1) is always even, the above technique makes the CFG messy by having a useless basic block that never runs. The code above would look like this in a disassembler: ![[binja_screenshot_11.png]] ## Call/Ret abuse The call and ret instructions can be used to confuse disassemblers if used in certain ways because disassemblers don't correctly trace the stack all the time, for example: ``` global _start section .text _start: add rsp, 16 push _start add qword [rsp], 7 inc qword [rsp] sub qword [rsp], 7 dec qword [rsp] call [rsp] ``` Or: ``` global _start section .text _start: push _start add qword [rsp], 1337 inc qword [rsp] sub qword [rsp], 1338 xchg rax, [rsp] mov [rsp], rax ret ``` The stack can be even harder to trace by using ret with a number after it: ``` global _start section .text add_two_numbers: mov rax, rdi add rax, rsi ret 8 add_three_numbers: mov rax, rdi add rax, rsi add rax, rdx ret add_four_numbers: mov rax, rdi add rax, rsi add rax, rdx add rax, rcx ret 16 _start: mov edi, 7 mov esi, 7 mov edx, 7 mov ecx, 7 push _start push 10 push add_three_numbers push 7 push 9 push add_two_numbers push _start push add_four_numbers ret 8 ``` The above code would look like this in a disassembler: ![[binja_screenshot_09.png]] ## Control flow flattening Control flow flattening is an anti RE technique that replaces the natural execution flow of the program with a dispatcher that executes a certain basic block based on a state variable. The executed basic block eventually leads back to the dispatcher, unless exiting, for example: ``` global _start section .text _start: xor al, al dispatcher: test al, al jz do_thing_01 cmp al, 1 jz do_thing_02 cmp al, 3 jz do_thing_03 jmp dispatcher do_thing_03: mov dil, al mov eax, 60 syscall do_thing_01: inc al jmp dispatcher do_thing_02: add al, 2 jmp dispatcher ``` This makes the intent of the program harder to determine because to a disassembler this looks like a single loop with many conditional jumps. The code above would look like this in a disassembler: ![[binja_screenshot_10.png]] ## Junk code/bytes In the x86_64 architecture since instructions are just bytes of varying lengths and because there are no explicit instruction boundaries, static analysis isn't always going to be 100% accurate about the instructions it shows the the person looking at the disassembler, however most of the time they will be accurate but junk bytes can be inserted between real instructions leading the disassembler to create a false result of what is happening. These junk bytes will be jumped over and never executed but the disassembler will still have an incorrect picture of what is happening, for example: ``` global _start section .text _start: xor eax, eax jmp $+4 db 0xff db 0xeb test eax, eax jmp $+5 db 0xeb db 0xfe db 0x77 jz _start db 0xeb db 0xfe db 0xfe db 0xeb db 0x99 ``` A disassembler may display the wrong result while in reality the code simply just sets eax to 0, checks if it is 0, and since it is 0, sets the instruction pointer to \_start. If there are some bytes in memory that represent instructions and depending on where the instruction pointer points to in those bytes, they may represent different instructions, for example: ``` byte 0: mov rax, 7 byte 7: ret ``` The bytes would look like: ``` 48 c7 c0 07 00 00 00 c3 ``` or ``` byte 0: 48 byte 1: c7 byte 2: c0 byte 3: 07 byte 4: 00 byte 5: 00 byte 6: 00 byte 7: c3 ``` Say the instruction pointer started at byte 1 instead of byte 0, the resulting code would be: ``` mov eax, 7 ret ``` If we started at byte 2, the resulting code would be: ``` rol byte [rdi], 0 add byte [rax], al ret ``` As I said before depending where the instruction pointers to in a group of bytes that are instructions, the bytes may represent different instructions, for example: ``` global _start section .text _start: nop lea rax, [_start] xor rax, 0x777 dec rax rdrand ecx and ecx, 3 lea rbx, [$+0x21 + rcx] xor rbx, 0x777 inc rbx lea rbx, [rbx - 2] inc rbx xor rbx, 0x777 jmp rbx nop nop nop jmp $+4 and r8b, byte [rsp] sub al, 0x24 inc rax xor rax, 0x777 push rax inc qword [rsp] ret ``` The following code would look like this in a disassembler: ![[binja_screenshot_12.png]] As you can see by the code above, the instruction pointer is set in between instructions via ``jmp rbx`` and ``jmp 0x401040``. The ``jmp rbx`` instruction will jump to or between 0x401039 and 0x40103c. The bytes in the range of those addresses, including the start and end address, all correspond to valid instructions, with all but one being a nop, while 0x40103c is the start of a jmp. The ``jmp 0x401040`` jumps two bytes after the ``and r8b, byte [rsp]`` instruction, which means that ``and r8b, byte [rsp]`` wont be getting executed but whatever is two bytes after will be getting executed, assuming it is a valid instruction. The bytes of ``and r8b, byte [rsp]`` are equal to: ``` byte 0: 0x44 byte 1: 0x22 byte 2: 0x04 byte 3: 0x24 ``` So the instruction pointer would execute whatever is two bytes after byte 0 which is: ``` byte 0: 0x04 ``` But that by itself is not a valid instruction however the following bytes together are valid instruction: ``` byte 0: 0x04 byte 1: 0x24 ``` Those bytes are equal to: ``add al, 0x24``. After that instruction is executed, ``sub al, 0x24`` is executed which restores al to what is originally was. ## Jump/call/ret tables A jump/call/ret table is an array of addresses, with each address holding the start of a basic block and is executed indirectly at runtime. For example: ``` global _start section .text func01: mov edi, 7 add rsp, 8 jmp _start func02: mov edi, 77 add rsp, 8 jmp _start func03: mov edi, 777 add rsp, 8 jmp _start func04: mov edi, 7777 add rsp, 8 jmp _start _start: rdrand eax and eax, 3 call [table + rax * 8] section .rdata table dq func01, func02, func03, func04 ``` Or: ``` global _start section .text func01: mov edi, 7 jmp _start func02: mov edi, 77 jmp _start func03: mov edi, 777 jmp _start func04: mov edi, 7777 jmp _start _start: rdrand eax and eax, 3 push qword [table + rax * 8] ret section .rdata table dq func01, func02, func03, func04 ``` Or: ``` global _start section .text func01: mov edi, 7 jmp _start func02: mov edi, 77 jmp _start func03: mov edi, 777 jmp _start func04: mov edi, 7777 jmp _start _start: rdrand eax and eax, 3 jmp [table + rax * 8] section .rdata table dq func01, func02, func03, func04 ``` Since the address to call/jump/ret to is selected at runtime, the disassembler cannot determine what basic block will execute next, usually a disassembler will list every possible basic block from the call/ret/jump table that will execute, for example: ![[binja_screenshot_02.png]] ## Computed call/ret/jumps A computed call/ret/jump is when the program creates an address to call, return, or jump to indirectly via math operations at runtime. Since the address is derived at runtime, the disassembler may have a hard time figuring out where the address leads to, for example: ``` global _start section .text _start: nop nop nop nop lea rax, [label + 7] mov cl, 10 loopy_01: xor ax, 7 inc ax dec ax xor ax, 7 dec rax dec rax sub rax, 5 test cl, cl jnz loopy_01 rdrand ebx and ebx, 3 add rax, rbx jmp rax label: nop nop nop nop lea rax, [_start + 7] mov cl, 10 loopy_02: xor ax, 7 inc ax dec ax xor ax, 7 dec rax dec rax sub rax, 5 test cl, cl jnz loopy_02 rdrand ebx and ebx, 3 add rax, rbx jmp rax ``` Like I said before, since the address to call/jump/ret to is selected at runtime, the disassembler cannot determine what basic block will execute next, for example: ![[binja_screenshot_03.png]] ## Loop based flow Loop based flow is when there is a loop that does something to a register or a pointer and that same register or pointer is used in changing the instruction pointer, for example: ``` global _start section .text _start: mov ebx, 7 push $+10 add [rsp], rbx ret nop nop nop nop nop nop nop jmp _start ``` The above code would look like this in a disassembler: ![[binja_screenshot_13.png]] Say we have: ``` global _start section .text _start: xor ebx, ebx inc bl inc bl inc bl inc bl inc bl inc bl inc bl push $+10 add [rsp], rbx ret nop nop nop nop nop nop nop jmp _start ``` The above code would look like this in a disassembler: ![[binja_screenshot_14.png]] In the two examples above, the disassembler can easily track what the instruction pointer is going to get set to, however if we have the following code it is harder for it to follow: ``` global _start section .text _start: xor ebx, ebx loop: inc bl cmp bl, 7 jne loop push $+10 add [rsp], rbx ret nop nop nop nop nop nop nop jmp _start ``` The above code would look like this in a disassembler: ![[binja_screenshot_15.png]] In the example above the disassembler cannot figure out what the instruction pointer will be set to because disassemblers don't run the actual instructions. # Process modifying techniques Sometimes processes modify their own bytes or another process's bytes for a number of reasons, but in this case they would be modifying bytes for obfuscation purposes. Processes can do this in a number of ways, for example: ## Loaders A way to think about a loader is that it takes control of a process and does what ever it wants with it. ``` PTRACE_ATTACH equ 16 PTRACE_DETACH equ 17 PTRACE_POKEDATA equ 5 global _start section .text ptrace: mov esi, [rel pid] xor eax, eax mov al, 101 syscall ret wait4: xor r10d, r10d xor edx, edx xor esi, esi mov edi, [rel pid] xor eax, eax mov al, 61 syscall ret _start: xor eax, eax mov al, 57 syscall test eax, eax jnz skippy fork_loop: jmp fork_loop skippy: mov [rel pid], eax xor r10d, r10d xor edx, edx mov edi, PTRACE_ATTACH call ptrace call wait4 lea r15, [rel realcode+40] lea r9, [rel realcode] lea rdx, [rel fork_loop] loop: mov r10, [r9] mov edi, PTRACE_POKEDATA call ptrace add rdx, 8 add r9, 8 cmp r9, r15 jne loop xor r10d, r10d xor edx, edx mov edi, PTRACE_DETACH call ptrace mov edi, 7 xor eax, eax mov al, 60 syscall section .bss pid resd 1 section .data realcode db 0x31, 0xD2, 0xB2, 0x04, 0x68, 0x48, 0x69, 0x21, 0x0A, 0x48, 0x89, 0xE6, 0x31, 0xFF, 0xFF, 0xC7, 0x31, 0xC0, 0xFE, 0xC0, 0x0F, 0x05, 0x48, 0x83, 0xC4, 0x08, 0xBF, 0x07, 0x00, 0x00, 0x00, 0xB8, 0x3C, 0x00, 0x00, 0x00, 0x0F, 0x05, 0x90, 0x90 ``` The above technique spawns a child process and keeps the instruction pointer of it at one place, via a loop, attaches to the child process with ptrace, copies the contents of realcode to where the loop is in the child process via PTRACE_POKEDATA, then the parent detaches and exits, the child process then runs the new code. Another example of a loader could be as follows: ``` PTRACE_ATTACH equ 16 PTRACE_DETACH equ 17 PTRACE_POKEDATA equ 5 PTRACE_GETREGS equ 12 rip_offset equ 128 global _start section .text ptrace: mov esi, [rel pid] xor eax, eax mov al, 101 syscall ret wait4: xor r10d, r10d xor edx, edx xor esi, esi mov edi, [rel pid] xor eax, eax mov al, 61 syscall ret _start: mov rax, [rsp + 16] test rax, rax jz exit xor ebx, ebx xor ecx, ecx atoi_loop: movzx ecx, byte [rax] sub ecx, 48 add rbx, rcx inc rax cmp byte [rax], 0 jz begin imul rbx, rbx, 10 jmp atoi_loop begin: mov [rel pid], rbx xor r10d, r10d xor edx, edx mov edi, PTRACE_ATTACH call ptrace call wait4 lea r10, [rel user_regs_struct] xor edx, edx mov edi, PTRACE_GETREGS call ptrace lea r15, [rel realcode + 40] lea r9, [rel realcode] mov rdx, [rel user_regs_struct + rip_offset] loop: mov r10, [r9] mov edi, PTRACE_POKEDATA call ptrace add rdx, 8 add r9, 8 cmp r9, r15 jne loop xor r10d, r10d xor edx, edx mov edi, PTRACE_DETACH call ptrace exit: mov edi, 7 xor eax, eax mov al, 60 syscall section .bss pid resd 1 user_regs_struct resb 216 section .data realcode db 0x31, 0xD2, 0xB2, 0x04, 0x68, 0x48, 0x69, 0x21, 0x0A, 0x48, 0x89, 0xE6, 0x31, 0xFF, 0xFF, 0xC7, 0x31, 0xC0, 0xFE, 0xC0, 0x0F, 0x05, 0x48, 0x83, 0xC4, 0x08, 0xBF, 0x07, 0x00, 0x00, 0x00, 0xB8, 0x3C, 0x00, 0x00, 0x00, 0x0F, 0x05, 0x90, 0x90 ``` The above code needs root privileges and a target PID to run successfully, once ran the process with the PID we entered would be frozen, the contents of it's registers would be ours, and we would write the new code starting at the current instruction pointer in the target process. Then the main process detaches and exits. The reason why loaders work is because they involve the user to deobfuscate the bytes and figure out how they are being used. ## Packers A way to think about a packer is basically a loader that operates on itself. For example: ``` global _start section .text _start: lea rax, [rel obfuscatedcode] lea r15, [rax + 40] loop: mov bl, [rax] sub bl, 3 xor bl, 7 mov [rax], bl inc rax cmp rax, r15 jne loop jmp obfuscatedcode section .data alloc exec write obfuscatedcode db 0x39, 0xd8, 0xb8, 0x06, 0x72, 0x52, 0x71, 0x29, 0x10, 0x52, 0x91, 0xe4, 0x39, 0xfb, 0xfb, 0xc3, 0x39, 0xca, 0xfc, 0xca, 0x0b, 0x05, 0x52, 0x87, 0xc6, 0x12, 0xbb, 0x03, 0x0a, 0x0a, 0x0a, 0xc2, 0x3e, 0x0a, 0x0a, 0x0a, 0x0b, 0x05, 0x9a, 0x9a ``` The above code simply deobfuscate the bytes of obfuscatedcode, and jumps to obfuscatedcode. This would not confuse the disassembler but it would confuse the person who is looking at the disassembler because they would have to figure out what the final result of bytes looks like for obfuscatedcode. This technique relies on a RWX group of bytes. Another example is as follows with no default RWX segments: ``` PROT equ 7 MAP equ 34 global _start section .text _start: xor r9, r9 xor r8d, r8d dec r8 mov r10d, MAP mov edx, PROT mov esi, 4096 xor edi, edi xor eax, eax mov al, 9 syscall lea r15, [rax + 40] lea rbx, [rel obfuscatedcode] loop: mov cl, [rbx] sub cl, 3 xor cl, 7 mov [rax], cl inc rax inc rbx cmp rax, r15 jne loop sub rax, 40 jmp rax section .data obfuscatedcode db 0x39, 0xd8, 0xb8, 0x06, 0x72, 0x52, 0x71, 0x29, 0x10, 0x52, 0x91, 0xe4, 0x39, 0xfb, 0xfb, 0xc3, 0x39, 0xca, 0xfc, 0xca, 0x0b, 0x05, 0x52, 0x87, 0xc6, 0x12, 0xbb, 0x03, 0x0a, 0x0a, 0x0a, 0xc2, 0x3e, 0x0a, 0x0a, 0x0a, 0x0b, 0x05, 0x9a, 0x9a ``` The above code uses mmap to create a new RWX segment, and the deobfuscated code is copied to the new segment, then we do a tail jump to the start of the new segment. The reason why packers are formidable is the same reason loaders are formidable, because of time. # Anti debugging techniques (TODO)