✝️ 1 John 2:15-17
Do not love the world or anything in the world. If anyone loves the world, love for the Father is not in them. For everything in the world—the lust of the flesh, the lust of the eyes, and the pride of life—comes not from the Father but from the world. The world and its desires pass away, but whoever does the will of God lives forever.
# This guide assumes you know x86_64 assembly
If you do not know x86_64 assembly there are many guides online or you could read chapter -2 and chapter 2 of my guide on exploitation.
## Introduction to some terms
A disassembler is a program that takes a program's raw machine code and reconstructs what the program will do. A disassembler shows the flow of the program in assembly, for example a disassembler might show assembly code that jumps based on a condition like this:
![[binja_screenshot.webp]]
Static analysis is the process of analyzing a program without executing it and dynamic analysis is analyzing the program when it's running. With a disassembler you mainly just look at the assembly of a program without actually running it, therefore you statically analyze it most of the time.
Sometimes when looking at the assembly code of a program, the code may have some anti reverse engineering techniques applied to it that makes it harder to understand. Anti reverse engineering, also called anti RE is a group of techniques designed to make the analysis of a program harder.
Obfuscation is a technique used to transform data without losing any of the meaning and deobfuscation is restoring the obfuscated data to it's original state.
A packer is a program that takes a program and creates a new program that contains an obfuscated version of that program and code to deobfuscate it and transfer execution to it. This program that is created is called a packed program. The stub is the part of the packed program that deobfuscates the obfuscated program. The instruction of the packed program's code that transfers execution to the deobfuscated program is called the tail jump. The start of the deobfuscated program is called the original entry point or OEP for short. Packers can obfuscate the code, data, imports, etc of the program it has contained which makes statically looking at a packed program very hard.
A control flow graph or CFG for short, is all the possible execution paths in a program. Most anti RE techniques aim to make the CFG messy. A basic block is code that has a single start and a single end, basic blocks typically end in either a call, a ret, a syscall, an interrupt or a jump.
A dispatcher is a basic block that determines what basic block to execute next.
A state variable is used by the dispatcher to decide which basic block executes next.
A virtual machine or VM for short is a custom execution environment with it's own state. A VM has it's own instructions that can modify it's state, for example say we have a VM and it has 3 registers and one instruction that adds the value of the first two registers to the third one. Bytecode is a custom instruction format used by the VM to represent logic that is not directly executed by the CPU. The program understands each bytecode instruction and runs it. An opcode is a part of every VM instruction that specifies which VM instruction it is. A handler is a basic block of assembly that implements the logic of each VM instruction. VMs have an interpreter loop which fetches the custom bytecode instructions and transfers execution to the correct handler.
A layer is one anti RE technique applied to a program.
A loader is a piece of code that treats bytes from disk (now loaded into memory) or memory as executable, maybe modifies them, and jumps to them.
A protector is system that combines multiple layers to a program.
A stage is a phase of execution where a specific layer is active.
A trampoline is used to redirect the flow of execution.
There are a lot of anti RE techniques as follows.
# Anti Disassembly techniques
There are two main ways disassemblers turn raw machine code into instructions, the first way is by starting at a certain offset in the machine code and turning the bytes into instructions, this is called a linear disassembly. Disassemblers also have another way which is called flow disassembly, which involves looking at the raw machine code and turning them into instructions, and then deciding which bytes to turn into instructions next based on how the instruction interacts with the instruction pointer.
Each technique has its weaknesses, linear disassembly turns all the bytes into instructions no matter what even if the bytes wont be treated as instructions, and flow disassembly is vulnerable to techniques that make it difficult to find out where the instruction pointer is going to be.
Anti disassembly techniques make the disassembler's contents messy, waste the person who is looking at the disassembler's time, or just make the assembly plain inaccurate. The following are some popular techniques:
## Indirect calls and jumps
An indirect call/jump is when a call or jump happens on a register or through a pointer. The ``ret`` instruction performs an indirect jump since it pops off and jumps to whatever is on the top of the stack. For example:
```
global _start
section .text
add_two_numbers:
mov rax, rdi
add rax, rsi
ret
_start:
add rsp, 8
mov edi, 7
mov esi, 8
mov rax, add_two_numbers
call rax
mov rax, _start
push rax
jmp [rsp]
```
You can also do the following:
```
global _start
section .text
add_two_numbers:
mov rax, rdi
add rax, rsi
ret
_start:
mov edi, 7
mov esi, 8
mov qword [rel some_global], add_two_numbers
call [rel some_global]
mov qword [rel some_global], _start
jmp [rel some_global]
section .data
some_global dq 0
```
With an indirect call/jump the disassembler may be confused on where the next basic block is going to be, which may create an incorrect CFG, for example:
![[binja_screenshot_04.png]]
## Jump instructions with the same target
A common anti RE technique is where we have conditional jumps that jump to the same place, for example:
```
global _start
section .text
_start:
mov al, 7
cmp al, 77
jz label
jnz label
inc al
jmp _start
label:
jmp _start
```
The above technique would make it trickier for a someone looking at disassembler to determine what is happening because the disassembler shows multiple jumps that all go to the same place cluttering the CFG, for example:
![[binja_screenshot_05.png]]
## Breaking into jumps
A common anti RE technique is to break up a single basic block into multiple basic blocks that jump to the next basic block, for example:
```
global _start
section .text
_start:
xor al, al
xor bl, bl
xor cl, cl
xor dl, dl
label_01:
inc al
jmp label_02
label_04:
inc dl
jmp label_05
label_03:
inc cl
jmp label_04
label_02:
inc bl
jmp label_03
label_05:
jmp _start
```
The result of single basic block being separated into multiple basic blocks will create a mess on the screen that makes it harder to determine what is happening. We can even make it messier via the following code using the previous technique:
```
global _start
section .text
_start:
xor al, al
xor bl, bl
xor cl, cl
xor dl, dl
label_01:
inc al
jz label_02
jnz label_02
label_04:
inc dl
jz label_05
jnz label_05
label_03:
inc cl
jz label_04
jnz label_04
label_02:
inc bl
jz label_03
jnz label_03
label_05:
jz _start
jnz _start
```
There would be more jumps appearing on the screen and it would be harder to figure out what is happening, for example:
![[binja_screenshot_06.png]]
## Useless code
A common anti RE technique is to have code that is useless, for example:
```
global _start
section .text
_start:
mov al, 7
inc al
dec al
inc al
dec al
add al, 2
sub al, 2
add al, 2
sub al, 3
add al, 4
dec al
dec al
dec al
jmp _start
```
In the code above al gets set to 7 at \_start and is 7 when \_start gets jumped to. The code in between doesn't really matter. The above technique would make it take more time for the person looking at the disassembler to figure out what is happening because they have to trace what is happening to al.
## Clones
A common anti RE technique is to have clones of the same basic block. For example:
```
global _start
section .text
_start:
mov al, 7
cmp al, 7
jz basic_block
inc al
cmp al, 8
jz basic_block
cmp al, 17
jz basic_block
jmp _start
basic_block:
mov al, 77
jmp _start
```
This technique would make the person looking at the disassembler take more time with their task because they have to look at both basic blocks even though they do the same thing, for example:
![[binja_screenshot_07.png]]
## Opaque predicates
An opaque predicate is a check that either always results in a jump to the same target or no jump at all, from the standpoint of a person looking at a disassembler it could appear that one or multiple jumps are possible, but in reality either one or no jumps ever occur.
An example is as follows:
```
global _start
section .text
_start:
xor eax, eax
jz will_always_branch
inc al
will_always_branch:
jmp _start
```
Or:
```
global _start
section .text
_start:
mov al, 7
cmp al, 77
jz label
jmp _start
label:
xor ax, 777
jmp _start
```
Opaque predicates makes the program's intent harder to determine because the disassembler will show the possible jumps even though it will only go to one of those locations, for example:
![[binja_screenshot_08.png]]
Another example of an opaque predict could be:
```
global _start
section .text
_start:
rdrand eax
lea rbx, [rax + 1]
mul rbx
test rax, 1
jz _start
mov edi, eax
mov eax, 60
syscall
```
Say we have a number x, no matter what x is, x * (x + 1) is always even, the above technique makes the CFG messy by having a useless basic block that never runs. The code above would look like this in a disassembler:
![[binja_screenshot_11.png]]
## Call/Ret abuse
The call and ret instructions can be used to confuse disassemblers if used in certain ways because disassemblers don't correctly trace the stack all the time, for example:
```
global _start
section .text
_start:
add rsp, 16
push _start
add qword [rsp], 7
inc qword [rsp]
sub qword [rsp], 7
dec qword [rsp]
call [rsp]
```
Or:
```
global _start
section .text
_start:
push _start
add qword [rsp], 1337
inc qword [rsp]
sub qword [rsp], 1338
xchg rax, [rsp]
mov [rsp], rax
ret
```
The stack can be even harder to trace by using ret with a number after it:
```
global _start
section .text
add_two_numbers:
mov rax, rdi
add rax, rsi
ret 8
add_three_numbers:
mov rax, rdi
add rax, rsi
add rax, rdx
ret
add_four_numbers:
mov rax, rdi
add rax, rsi
add rax, rdx
add rax, rcx
ret 16
_start:
mov edi, 7
mov esi, 7
mov edx, 7
mov ecx, 7
push _start
push 10
push add_three_numbers
push 7
push 9
push add_two_numbers
push _start
push add_four_numbers
ret 8
```
The above code would look like this in a disassembler:
![[binja_screenshot_09.png]]
## Control flow flattening
Control flow flattening is an anti RE technique that replaces the natural execution flow of the program with a dispatcher that executes a certain basic block based on a state variable. The executed basic block eventually leads back to the dispatcher, unless exiting, for example:
```
global _start
section .text
_start:
xor al, al
dispatcher:
test al, al
jz do_thing_01
cmp al, 1
jz do_thing_02
cmp al, 3
jz do_thing_03
jmp dispatcher
do_thing_03:
mov dil, al
mov eax, 60
syscall
do_thing_01:
inc al
jmp dispatcher
do_thing_02:
add al, 2
jmp dispatcher
```
This makes the intent of the program harder to determine because to a disassembler this looks like a single loop with many conditional jumps. The code above would look like this in a disassembler:
![[binja_screenshot_10.png]]
## Junk code/bytes
In the x86_64 architecture since instructions are just bytes of varying lengths and because there are no explicit instruction boundaries, static analysis isn't always going to be 100% accurate about the instructions it shows the the person looking at the disassembler, however most of the time they will be accurate but junk bytes can be inserted between real instructions leading the disassembler to create a false result of what is happening. These junk bytes will be jumped over and never executed but the disassembler will still have an incorrect picture of what is happening, for example:
```
global _start
section .text
_start:
xor eax, eax
jmp $+4
db 0xff
db 0xeb
test eax, eax
jmp $+5
db 0xeb
db 0xfe
db 0x77
jz _start
db 0xeb
db 0xfe
db 0xfe
db 0xeb
db 0x99
```
A disassembler may display the wrong result while in reality the code simply just sets eax to 0, checks if it is 0, and since it is 0, sets the instruction pointer to \_start. If there are some bytes in memory that represent instructions and depending on where the instruction pointer points to in those bytes, they may represent different instructions, for example:
```
byte 0: mov rax, 7
byte 7: ret
```
The bytes would look like:
```
48 c7 c0 07 00 00 00 c3
```
or
```
byte 0: 48
byte 1: c7
byte 2: c0
byte 3: 07
byte 4: 00
byte 5: 00
byte 6: 00
byte 7: c3
```
Say the instruction pointer started at byte 1 instead of byte 0, the resulting code would be:
```
mov eax, 7
ret
```
If we started at byte 2, the resulting code would be:
```
rol byte [rdi], 0
add byte [rax], al
ret
```
As I said before depending where the instruction pointers to in a group of bytes that are instructions, the bytes may represent different instructions, for example:
```
global _start
section .text
_start:
nop
lea rax, [_start]
xor rax, 0x777
dec rax
rdrand ecx
and ecx, 3
lea rbx, [$+0x21 + rcx]
xor rbx, 0x777
inc rbx
lea rbx, [rbx - 2]
inc rbx
xor rbx, 0x777
jmp rbx
nop
nop
nop
jmp $+4
and r8b, byte [rsp]
sub al, 0x24
inc rax
xor rax, 0x777
push rax
inc qword [rsp]
ret
```
The following code would look like this in a disassembler:
![[binja_screenshot_12.png]]
As you can see by the code above, the instruction pointer is set in between instructions via ``jmp rbx`` and ``jmp 0x401040``. The ``jmp rbx`` instruction will jump to or between 0x401039 and 0x40103c. The bytes in the range of those addresses, including the start and end address, all correspond to valid instructions, with all but one being a nop, while 0x40103c is the start of a jmp. The ``jmp 0x401040`` jumps two bytes after the ``and r8b, byte [rsp]`` instruction, which means that ``and r8b, byte [rsp]`` wont be getting executed but whatever is two bytes after will be getting executed, assuming it is a valid instruction. The bytes of ``and r8b, byte [rsp]`` are equal to:
```
byte 0: 0x44
byte 1: 0x22
byte 2: 0x04
byte 3: 0x24
```
So the instruction pointer would execute whatever is two bytes after byte 0 which is:
```
byte 0: 0x04
```
But that by itself is not a valid instruction however the following bytes together are valid instruction:
```
byte 0: 0x04
byte 1: 0x24
```
Those bytes are equal to: ``add al, 0x24``. After that instruction is executed, ``sub al, 0x24`` is executed which restores al to what is originally was.
## Jump/call/ret tables
A jump/call/ret table is an array of addresses, with each address holding the start of a basic block and is executed indirectly at runtime. For example:
```
global _start
section .text
func01:
mov edi, 7
add rsp, 8
jmp _start
func02:
mov edi, 77
add rsp, 8
jmp _start
func03:
mov edi, 777
add rsp, 8
jmp _start
func04:
mov edi, 7777
add rsp, 8
jmp _start
_start:
rdrand eax
and eax, 3
call [table + rax * 8]
section .rdata
table dq func01, func02, func03, func04
```
Or:
```
global _start
section .text
func01:
mov edi, 7
jmp _start
func02:
mov edi, 77
jmp _start
func03:
mov edi, 777
jmp _start
func04:
mov edi, 7777
jmp _start
_start:
rdrand eax
and eax, 3
push qword [table + rax * 8]
ret
section .rdata
table dq func01, func02, func03, func04
```
Or:
```
global _start
section .text
func01:
mov edi, 7
jmp _start
func02:
mov edi, 77
jmp _start
func03:
mov edi, 777
jmp _start
func04:
mov edi, 7777
jmp _start
_start:
rdrand eax
and eax, 3
jmp [table + rax * 8]
section .rdata
table dq func01, func02, func03, func04
```
Since the address to call/jump/ret to is selected at runtime, the disassembler cannot determine what basic block will execute next, usually a disassembler will list every possible basic block from the call/ret/jump table that will execute, for example:
![[binja_screenshot_02.png]]
## Computed call/ret/jumps
A computed call/ret/jump is when the program creates an address to call, return, or jump to indirectly via math operations at runtime. Since the address is derived at runtime, the disassembler may have a hard time figuring out where the address leads to, for example:
```
global _start
section .text
_start:
nop
nop
nop
nop
lea rax, [label + 7]
mov cl, 10
loopy_01:
xor ax, 7
inc ax
dec ax
xor ax, 7
dec rax
dec rax
sub rax, 5
test cl, cl
jnz loopy_01
rdrand ebx
and ebx, 3
add rax, rbx
jmp rax
label:
nop
nop
nop
nop
lea rax, [_start + 7]
mov cl, 10
loopy_02:
xor ax, 7
inc ax
dec ax
xor ax, 7
dec rax
dec rax
sub rax, 5
test cl, cl
jnz loopy_02
rdrand ebx
and ebx, 3
add rax, rbx
jmp rax
```
Like I said before, since the address to call/jump/ret to is selected at runtime, the disassembler cannot determine what basic block will execute next, for example:
![[binja_screenshot_03.png]]
## Loop based flow
Loop based flow is when there is a loop that does something to a register or a pointer and that same register or pointer is used in changing the instruction pointer, for example:
```
global _start
section .text
_start:
mov ebx, 7
push $+10
add [rsp], rbx
ret
nop
nop
nop
nop
nop
nop
nop
jmp _start
```
The above code would look like this in a disassembler:
![[binja_screenshot_13.png]]
Say we have:
```
global _start
section .text
_start:
xor ebx, ebx
inc bl
inc bl
inc bl
inc bl
inc bl
inc bl
inc bl
push $+10
add [rsp], rbx
ret
nop
nop
nop
nop
nop
nop
nop
jmp _start
```
The above code would look like this in a disassembler:
![[binja_screenshot_14.png]]
In the two examples above, the disassembler can easily track what the instruction pointer is going to get set to, however if we have the following code it is harder for it to follow:
```
global _start
section .text
_start:
xor ebx, ebx
loop:
inc bl
cmp bl, 7
jne loop
push $+10
add [rsp], rbx
ret
nop
nop
nop
nop
nop
nop
nop
jmp _start
```
The above code would look like this in a disassembler:
![[binja_screenshot_15.png]]
In the example above the disassembler cannot figure out what the instruction pointer will be set to because disassemblers don't run the actual instructions.
# Process modifying techniques
Sometimes processes modify their own bytes or another process's bytes for a number of reasons, but in this case they would be modifying bytes for obfuscation purposes. Processes can do this in a number of ways, for example:
## Loaders
A way to think about a loader is that it takes control of a process and does what ever it wants with it.
```
PTRACE_ATTACH equ 16
PTRACE_DETACH equ 17
PTRACE_POKEDATA equ 5
global _start
section .text
ptrace:
mov esi, [rel pid]
xor eax, eax
mov al, 101
syscall
ret
wait4:
xor r10d, r10d
xor edx, edx
xor esi, esi
mov edi, [rel pid]
xor eax, eax
mov al, 61
syscall
ret
_start:
xor eax, eax
mov al, 57
syscall
test eax, eax
jnz skippy
fork_loop:
jmp fork_loop
skippy:
mov [rel pid], eax
xor r10d, r10d
xor edx, edx
mov edi, PTRACE_ATTACH
call ptrace
call wait4
lea r15, [rel realcode+40]
lea r9, [rel realcode]
lea rdx, [rel fork_loop]
loop:
mov r10, [r9]
mov edi, PTRACE_POKEDATA
call ptrace
add rdx, 8
add r9, 8
cmp r9, r15
jne loop
xor r10d, r10d
xor edx, edx
mov edi, PTRACE_DETACH
call ptrace
mov edi, 7
xor eax, eax
mov al, 60
syscall
section .bss
pid resd 1
section .data
realcode db 0x31, 0xD2, 0xB2, 0x04, 0x68, 0x48, 0x69, 0x21, 0x0A, 0x48, 0x89, 0xE6, 0x31, 0xFF, 0xFF, 0xC7, 0x31, 0xC0, 0xFE, 0xC0, 0x0F, 0x05, 0x48, 0x83, 0xC4, 0x08, 0xBF, 0x07, 0x00, 0x00, 0x00, 0xB8, 0x3C, 0x00, 0x00, 0x00, 0x0F, 0x05, 0x90, 0x90
```
The above technique spawns a child process and keeps the instruction pointer of it at one place, via a loop, attaches to the child process with ptrace, copies the contents of realcode to where the loop is in the child process via PTRACE_POKEDATA, then the parent detaches and exits, the child process then runs the new code. Another example of a loader could be as follows:
```
PTRACE_ATTACH equ 16
PTRACE_DETACH equ 17
PTRACE_POKEDATA equ 5
PTRACE_GETREGS equ 12
rip_offset equ 128
global _start
section .text
ptrace:
mov esi, [rel pid]
xor eax, eax
mov al, 101
syscall
ret
wait4:
xor r10d, r10d
xor edx, edx
xor esi, esi
mov edi, [rel pid]
xor eax, eax
mov al, 61
syscall
ret
_start:
mov rax, [rsp + 16]
test rax, rax
jz exit
xor ebx, ebx
xor ecx, ecx
atoi_loop:
movzx ecx, byte [rax]
sub ecx, 48
add rbx, rcx
inc rax
cmp byte [rax], 0
jz begin
imul rbx, rbx, 10
jmp atoi_loop
begin:
mov [rel pid], rbx
xor r10d, r10d
xor edx, edx
mov edi, PTRACE_ATTACH
call ptrace
call wait4
lea r10, [rel user_regs_struct]
xor edx, edx
mov edi, PTRACE_GETREGS
call ptrace
lea r15, [rel realcode + 40]
lea r9, [rel realcode]
mov rdx, [rel user_regs_struct + rip_offset]
loop:
mov r10, [r9]
mov edi, PTRACE_POKEDATA
call ptrace
add rdx, 8
add r9, 8
cmp r9, r15
jne loop
xor r10d, r10d
xor edx, edx
mov edi, PTRACE_DETACH
call ptrace
exit:
mov edi, 7
xor eax, eax
mov al, 60
syscall
section .bss
pid resd 1
user_regs_struct resb 216
section .data
realcode db 0x31, 0xD2, 0xB2, 0x04, 0x68, 0x48, 0x69, 0x21, 0x0A, 0x48, 0x89, 0xE6, 0x31, 0xFF, 0xFF, 0xC7, 0x31, 0xC0, 0xFE, 0xC0, 0x0F, 0x05, 0x48, 0x83, 0xC4, 0x08, 0xBF, 0x07, 0x00, 0x00, 0x00, 0xB8, 0x3C, 0x00, 0x00, 0x00, 0x0F, 0x05, 0x90, 0x90
```
The above code needs root privileges and a target PID to run successfully, once ran the process with the PID we entered would be frozen, the contents of it's registers would be ours, and we would write the new code starting at the current instruction pointer in the target process. Then the main process detaches and exits.
The reason why loaders work is because they involve the user to deobfuscate the bytes and figure out how they are being used.
## Packers
A way to think about a packer is basically a loader that operates on itself. For example:
```
global _start
section .text
_start:
lea rax, [rel obfuscatedcode]
lea r15, [rax + 40]
loop:
mov bl, [rax]
sub bl, 3
xor bl, 7
mov [rax], bl
inc rax
cmp rax, r15
jne loop
jmp obfuscatedcode
section .data alloc exec write
obfuscatedcode db 0x39, 0xd8, 0xb8, 0x06, 0x72, 0x52, 0x71, 0x29, 0x10, 0x52, 0x91, 0xe4, 0x39, 0xfb, 0xfb, 0xc3, 0x39, 0xca, 0xfc, 0xca, 0x0b, 0x05, 0x52, 0x87, 0xc6, 0x12, 0xbb, 0x03, 0x0a, 0x0a, 0x0a, 0xc2, 0x3e, 0x0a, 0x0a, 0x0a, 0x0b, 0x05, 0x9a, 0x9a
```
The above code simply deobfuscate the bytes of obfuscatedcode, and jumps to obfuscatedcode. This would not confuse the disassembler but it would confuse the person who is looking at the disassembler because they would have to figure out what the final result of bytes looks like for obfuscatedcode. This technique relies on a RWX group of bytes. Another example is as follows with no default RWX segments:
```
PROT equ 7
MAP equ 34
global _start
section .text
_start:
xor r9, r9
xor r8d, r8d
dec r8
mov r10d, MAP
mov edx, PROT
mov esi, 4096
xor edi, edi
xor eax, eax
mov al, 9
syscall
lea r15, [rax + 40]
lea rbx, [rel obfuscatedcode]
loop:
mov cl, [rbx]
sub cl, 3
xor cl, 7
mov [rax], cl
inc rax
inc rbx
cmp rax, r15
jne loop
sub rax, 40
jmp rax
section .data
obfuscatedcode db 0x39, 0xd8, 0xb8, 0x06, 0x72, 0x52, 0x71, 0x29, 0x10, 0x52, 0x91, 0xe4, 0x39, 0xfb, 0xfb, 0xc3, 0x39, 0xca, 0xfc, 0xca, 0x0b, 0x05, 0x52, 0x87, 0xc6, 0x12, 0xbb, 0x03, 0x0a, 0x0a, 0x0a, 0xc2, 0x3e, 0x0a, 0x0a, 0x0a, 0x0b, 0x05, 0x9a, 0x9a
```
The above code uses mmap to create a new RWX segment, and the deobfuscated code is copied to the new segment, then we do a tail jump to the start of the new segment.
The reason why packers are formidable is the same reason loaders are formidable, because of time.
# Anti debugging techniques
(TODO)