Assembly Basics#
Mastery Level: Understand assembly, analyze gadgets, step through debugging to understand register states
Reference Article: x86_64 Assembly Part 1: AT&T Assembly Syntax_x86_64 Assembly at&t-CSDN Blog
Registers#
The commonly used x86 CPU
registers are 8: EAX
, EBX
, ECX
, EDX
, EDI
, ESI
, EBP
, ESP
The CPU prioritizes reading and writing registers, then exchanges data through registers, caches, and memory to achieve buffering. Accessing registers by name is the fastest method, hence they are also referred to as zero-level cache.
Access speeds from high to low are: registers > Level 1 cache > Level 2 cache > Level 3 cache > memory > hard disk
General Registers and Their Uses#
The aforementioned 8 registers each have specific purposes. Taking 32-bit CPU
as an example, the roles of these registers are summarized in the table below:
Register | Meaning | Purpose | Contains Registers |
---|---|---|---|
EAX | Accumulator Register | Commonly used for multiplication, division, and function return values | AX(AH, AL) |
EBX | Base Register | Often used as a pointer for memory data, or as a base to access memory | BX(BH, BL) |
ECX | Counter Register | Commonly used as a counter in string and loop operations | CX(CH, CL) |
EDX | Data Register | Commonly used for multiplication, division, and I/O pointers | DX(DH, DL) |
ESI | Source Index Register | Commonly used as a pointer for memory data and source strings | SI |
EDI | Destination Index Register | Commonly used as a pointer for memory data and destination strings | DI |
ESP | Stack Pointer Register | Only serves as the top pointer of the stack; cannot be used for arithmetic operations and data transfer | SP |
EBP | Base Pointer Register | Only serves as a stack pointer, can access any address in the stack, often used to transfer data in ESP, also commonly used as a base to access the stack; cannot be used for arithmetic operations and data transfer | BP |
In the above table, each commonly used register has other names following it; rax, eax, ax, ah, al actually represent the same register but include different ranges.
Below is the correspondence for 64-bit registers:
|63..32|31..16|15-8|7-0|
|AH. |AL.|
|AX......|
|EAX............|
|RAX...................|
Instruction Pointer Register#
The Instruction Pointer Register (RIP
) contains the logical address of the next instruction to be executed.
Typically, after fetching an instruction, RIP increments to point to the next instruction. In x86_64, RIP increments by an offset of 8 bytes.
However, RIP does not always increment; there are exceptions, such as the call
and ret
instructions. The call
instruction pushes the current content of RIP onto the stack and transfers control to the target function; the ret
instruction performs a pop operation, popping the previously pushed 8-byte RIP address back into RIP.
Flag Register (EFLAGS)#
Assembly Language Instructions#
Common assembly instructions: mov
, je
, jmp
, call
, add
, sub
, inc
, dec
, and
, or
Data Transfer Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
MOV | Transfer | MOV dest, src | Move data from src to dest |
PUSH | Push | PUSH src | Push source operand src onto the stack |
POP | Pop | POP dest | Pop data from the top of the stack into dest |
Arithmetic Operation Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
ADD | Addition | ADD dest, src | Add src to dest |
SUB | Subtraction | SUB dest, src | Subtract src from dest |
INC | Increment | INC dest | Increment dest by 1 |
DEC | Decrement | DEC dest | Decrement dest by 1 |
Logical Operation Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
NOT | Negation | NOT dest | Bitwise negation of operand dest |
AND | And | AND dest, src | Perform AND operation on dest and src, store in dest |
OR | Or | OR dest, src | Perform OR operation on dest and src, store in dest |
XOR | Exclusive Or | XOR dest, src | Perform XOR operation on dest and src, store in dest |
Loop Control Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
LOOP | Counting Loop | LOOP label | Decrement ECX by 1, jump to label if ECX is not 0, otherwise execute the statement after LOOP |
Transfer Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
JMP | Unconditional Jump | JMP label | Unconditionally jump to the label position |
CALL | Procedure Call | CALL label | Directly call label |
JE | Conditional Jump | JE label | Jump to label if zf = 1 |
JNE | Conditional Jump | JNE label | Jump to label if zf = 0 |
Differences in Assembly Between Linux and Windows#
The assembly syntax in linux
and windows
is different. The differences in syntax are not absolutely related to the systems; generally, gcc/g++
compiler is used on linux
, while Microsoft's cl
, i.e., MSBUILD
, is used on windows
. Therefore, the different codes arise from the different compilers. The gcc
uses AT&T assembly syntax format, while MSBUILD
uses Intel assembly syntax format.
Difference | Intel | AT&T |
---|---|---|
Referencing Register Names | eax | %eax |
Operand Assignment Order | mov dest, src | movl src, dest |
Prefix for Register and Immediate Instructions | mov ebx, 0xd00d | movl $0xd00d, %ebx |
Register Indirect Addressing | [eax] | (%eax) |
Data Type Size | Suffix letters added to opcode, “l” for 32-bit, “w” for 16-bit, “b” for 8-bit (mov dx, word ptr [eax]) | Prefix with dword ptr, word ptr, byte ptr format (movb %bl %al) |
Addressing Modes#
Direct Addressing
Memory Addressing: [ ]
Overflow (Signed & Unsigned & Upward Overflow)#
- Insufficient storage bits
- Overflow into the sign bit
Integer overflow used in conjunction with other vulnerabilities
In my opinion, a carry in the signed bit represents overflow.
LINUX File Basics#
Protection Levels: 0-3
0 - Kernel
3 - User
Virtual Memory: The address after physical memory is converted by the MMU. The system allocates a segment of virtual memory space to each user process.
Big Endian and Little Endian#
Big Endian: High-order data -> Low-order computer address (more in line with human reading habits)
Little Endian: Low-order data -> Low-order computer address (counterintuitive but more in line with storage logic and operation rules)
Computer outputs strings: from low address to high address
The data storage format in Linux is little-endian, while the ARM architecture is big-endian.
When inputting numbers as strings, be mindful of the format; Linux reads data from low to high, and pwntools can be used for conversion.
File Descriptors#
Each file descriptor corresponds to an open file.
- 0: stdin
- 1: stdout
- 2: stderr
stdin->buf->stdout
For example:
read(0, buf, size)
write(1, buf, size)
Stack#
A stripped-down version of an array, can only be operated at one end.
Data structure: Last In First Out (LIFO), same as function call order
Function execution order: main -> funA -> funB
Function completion order: funB -> funA -> main
Basic operations: push to stack, pop from stack
Function call instruction: call, return instruction: ret
The operating system sets up a stack for each program, and each independent function of the program has its own stack frame.
In Linux, the stack grows from high addresses (top of the stack) to low addresses (bottom of the stack).
Many algorithms, such as DFS, utilize the stack and are implemented recursively.
Calling Convention#
What is Calling Convention#
In the process of function calls, there are two participants: the caller and the callee.
Calling convention specifies how the caller and callee cooperate to implement function calls, including the following details:
- Where to store function parameters. Are they stored in registers? Or on the stack? In which registers? At which positions on the stack?
- The order of parameter passing. Are parameters pushed onto the stack from left to right, or from right to left?
- How return values are passed back to the caller. Are they stored in registers, or elsewhere?
- And so on.
So, why do we need a calling convention?
For example, if we write code in assembly language without a unified standard to follow, then A
might habitually place parameters on the stack, B
might prefer registers, C
has another habit, and so on. When A
tries to call someone else's code, they must adhere to the other person's conventions. For instance, calling B
requires A
to place parameters in the registers specified by B
; calling C
requires yet another approach.
The calling convention is designed to solve these issues. It specifies the details of function calls so that everyone adheres to a common agreement, allowing us to call others' code without needing to make modifications.
Function Call Stack#
- Function Call: When a function is called, the program allocates a new stack frame for it on the call stack. The stack frame contains the function's parameters, local variables, return address, and other information.
- Parameter Passing: During the function call, parameters are passed to the called function via push operations. These parameters are stored in the stack frame for use within the function.
- Execute Function: The called function begins execution, using the parameters and local variables from the stack frame. The execution process of the function may involve complex logic and calculations.
- Return Value Handling: When the function execution is complete, the program returns to the code location that called the function. This location is specified by the return address in the stack frame. If the function has a return value, that value will be pushed onto the caller's stack frame.
- Stack Frame Destruction: Once the function call is complete, its corresponding stack frame is popped from the call stack and destroyed, releasing the memory resources it occupied.
Specific Function Call Process#
- pop
The effect of pop rax:
mov rax, [rsp];
// Pop the top data from the stack into the register
add rsp, 8;
// Move the stack pointer down by one unit
- push
The effect of push rax:
sub rsp, 8;
// Move the stack frame up by one unit
mov [rsp], rax;
// Place a register's value at the top of the stack
- jmp
Immediate jump, does not involve function calls, used for loops, if-else
For example, the effect of call 1234h:
mov rip, 1234h;
- call
Function call, requires saving the return address
For example, the effect of call 1234h:
push rip;
mov rip, 1234h;
- ret
pop rip;
Example: main calls funB, funB calls funA, step-by-step analysis of stack frame changes:
During the function call process:
- Calling Function:
- Push
rip
onto the stack as the return address. (call)
- Push
- Called Function:
- Push
rbp
onto the stack as the base pointer for the current stack frame. - Assign the value of
rsp
torbp
, makingrbp
point to the bottom of the current stack frame. - Allocate stack space for local variables and temporary data, reducing
rsp
by the appropriate size. - Use
rsp
as a base pointer to access function parameters and local variables.
- Push
When the function returns: leave; ret;
- Called Function:
- Pop the allocated local variables and temporary data from the stack.
- Restore
rsp
to its value at the time of the function call.
- Calling Function:
- Pop the return address from the stack.
- Update
rip
to the return address.
Stack Frame Change Diagram:
+----------------------------+
| main function stack frame |
+----------------------------+
| Return Address |
| rbp (Base Pointer of main) |
+----------------------------+
| funB calling function frame |
+----------------------------+
| Return Address |
| rbp (Base Pointer of funB) |
+----------------------------+
| funA called function frame |
+----------------------------+
| rbp (Base Pointer of funA) |
| Local Variables |
+----------------------------+
How to Pass Parameters#
The return value is given to RAX.
The calling convention for x86-64 functions is:
-
Parameters are passed from left to right to
RDI
,RSI
,RDX
,RCX
,R8
,R9
. -
If a function has more than 6 parameters, they are pushed onto the stack from right to left.
System Calls#
syscall Instruction#
Used to call system functions, specifying the system call number (which can be found in the 64-bit Linux system call table).
The system call number resides in the RAX register, then set up the parameters and execute syscall.
Example: calling read(0, buf, size)
mov rax, 0;
mov rdi, 0;
mov rsi, buf;
mov rdx, size;
syscall;
ELF File Structure#
ELF File Format#
ELF (Executable and Linkable Format) is the binary executable file format in Linux.
ELF Header#
The command readelf -h
can read the ELF file header. The ELF header includes the program's entry point (Entry Point Address), segment information, and section information. The Start of program headers and Start of section headers in the ELF header can locate the positions of the segment table and section table in the file.
Section Header Table#
Use the command readelf -S
to read the section information (sections) of the binary ELF file. The program test has a total of 31 sections. Assembly language is written according to sections, such as the .text section and .data section. Assembly code corresponds one-to-one with machine code, and the section information is retained when the assembly program is converted to binary code.
readelf -S test
Program Header Table#
When an ELF program is executed (loaded into memory), the loader creates a process memory image based on the program's segment table. Use the command readelf -l
to read the segment information (segments) of the binary ELF file. The program test has a total of 13 segments, and the number of segments is greater than the number of sections, so multiple sections may map to the same segment.
Based on the permissions of the sections: readable and writable sections are mapped into one segment, read-only sections are mapped into another segment, and so on.
readelf -l test
Linking View/Execution View#
Segment and Section are two different perspectives on the same ELF file. This is referred to as different views in ELF.
From the perspective of Section, the ELF file is the Linking View.
From the perspective of Segment, it is the Execution View.
When discussing ELF loading, segment specifically refers to Segment; in other contexts, segment refers to Section.
libc#
glibc: GNU C Library, glibc itself is the C standard library under GNU, which gradually became the standard C library for Linux.
Its suffix is libc.so, and it is essentially an ELF file that can be executed independently. The dynamic link library encountered in pwn challenges is typically the libc.so file.
Almost all programs in Linux depend on libc, so the functions in libc are crucial.
Lazy Binding Mechanism#
Static Compilation vs Dynamic Compilation#
Dynamically compiled executable files require accompanying dynamic link libraries. During execution, they need to call commands from the corresponding dynamic link library. The advantages are that it reduces the size of the executable file and speeds up compilation, saving system resources. The disadvantages are that even very simple programs that only use one or two commands from the library still need to accompany a relatively large link library; if the corresponding runtime library is not installed on other computers, the dynamically compiled executable file cannot run.
Static compilation means that the compiler extracts the necessary parts from the corresponding dynamic link library (.so) during the compilation of the executable file and links them into the executable file, allowing it to run without relying on dynamic link libraries. Thus, the advantages and disadvantages of static compilation complement those of dynamically compiled executable files.
Lazy Binding#
Using lazy binding is based on the premise that under dynamic linking, the modules loaded by the program contain a large number of function calls.
Lazy binding postpones the binding of function addresses until the first call to that function, thus avoiding the dynamic linker from processing a large number of function reference relocations during loading.
The implementation of lazy binding uses two special data structures: the Global Offset Table (GOT) and the Procedure Linkage Table (PLT).
Global Offset Table (GOT)#
The address of the library function is only saved in the GOT table after it is called for the first time.
The global offset table exists as an independent section in the ELF file and contains two types, with the corresponding section names being .got
and .got.plt
. The .got
section stores the addresses of all external variable references; the .got.plt
section stores the addresses of all external function references, primarily using the .got.plt
table for lazy binding. The basic structure of the .got.plt
table is shown in the figure below:
Among them, the first three items of .got.plt
store special address references:
- GOT[0]: Stores the address of the
.dynamic
section, which the dynamic linker uses to extract dynamic linking-related information; - GOT[1]: Stores the ID of the current module;
- GOT[2]: Stores the address pointing to the dynamic linker’s
_dl_runtime_resolve
function, which is used to resolve the actual symbol addresses of shared library functions.
Procedure Linkage Table (PLT)#
To implement lazy binding, when calling a function from an external module, the program does not directly jump through the GOT but instead jumps through a specific entry stored in the PLT table. For all external functions, there will be a corresponding entry in the PLT table, where each entry contains 16 bytes of code used to call a specific function. The general structure of the procedure linkage table is as follows:
The procedure linkage table contains not only the PLT entries created specifically for the external functions called by the compiler but also a special entry corresponding to PLT[0], which is used to jump to the dynamic linker for actual symbol resolution and relocation work:
PLT and GOT#
Regardless of how many times the external function is called, the program actually calls the PLT table, which is composed of a series of assembly instructions.
So, one might wonder: why is there a PLT, a transition, instead of going directly to the GOT?
It's like having many relatives; you need to visit them every week, so you write down their addresses in a notebook. When you want to visit, you check the notebook. This notebook is like a PLT table, where each address jumps to the corresponding GOT table address (your relatives' homes).
If one day you find it troublesome to run around, you invite all your relatives to live at your house, and now you only need to visit the corresponding room. The notebook becomes useless, and you throw it away. This is when you directly access the GOT table without the PLT table.
Do you think a notebook takes up less space, or a house full of relatives takes up less space?
This is one reason for the existence of the PLT table: to utilize memory more efficiently.
Another reason is to increase security.
LINUX Security Protection Mechanisms#
CANARY#
Canary is a protective measure against stack overflow attacks. Its basic principle is to copy a random number canary of length 8 bytes with a starting byte of \x00 from memory at fs: 0x28, which will be pushed onto the stack immediately after creating the stack frame (right next to ebp). When an attacker attempts to overwrite ebp or the return address below ebp through a buffer overflow, they will inevitably overwrite the value of the canary; when the program ends, it checks whether the value of CANARY is consistent with the previous one. If not, the program will not continue running, thus preventing buffer overflow attacks.
Bypass Methods:
- Modify the canary.
- Leak the canary.
Canary Bypass#
- Format String Bypass Canary
- Read the value of the canary through a format string.
- Canary Brute Force (for programs with fork function)
- The fork function effectively self-replicates; each time a program is copied, the memory layout is the same, so the canary values are also the same. We can brute force it bit by bit; if the program crashes, that bit is incorrect; if the program runs normally, we can proceed to the next bit until we find the correct canary.
- Stack Smashing (deliberately trigger canary_ssp leak)
- Hijack __stack_chk_fail
- Modify the address of the __stack_chk_fail function in the GOT table. After the stack overflow, execute this function, but since its address has been modified, the program will jump to the address we want to execute.
NX#
Data on the stack has no execution permission (not executable). Once enabled, writable segments such as heap, stack, and bss segments in the program cannot be executed.
Bypass Method:
Use the mprotect function to modify segment permissions, and the nx protection does not affect ROP or GOT table hijacking exploit methods.
PIE and ASLR#
What is ASLR?#
ASLR is a feature option of the Linux operating system that applies when programs (ELF) are loaded into memory for execution. It is a security protection technology against buffer overflow attacks that randomizes the loading address to prevent attackers from directly locating the attack code position, thus preventing overflow attacks.
Enable/Disable ASLR#
Check the current system's ASLR status:
sudo cat /proc/sys/kernel/randomize_va_space
ASLR has three security levels:
- 0: ASLR is off
- 1: Randomizes stack base address (stack), shared libraries (.so libraries), mmap base address
- 2: On top of 1, adds randomization of heap base address (chunk)
What is PIE?#
PIE is a feature option of the gcc compiler that applies during the compilation of programs (ELF). It is a protection technology against fixed addresses for code segments (.text), data segments (.data), and uninitialized global variable segments (.bss). If a program has PIE protection enabled, the loading address changes each time the program is loaded, making it impossible to use tools like ROPgadget to assist in solving problems.
Enable PIE#
Add the parameter -fPIE
when compiling with gcc.
Once PIE is enabled, it randomizes the loading addresses of the code segment (.text), initialized data segment (.data), and uninitialized data segment (.bss).
PIE Bypass#
The loading address of a program is generally in memory page units, so the last three digits of the program's base address must be 0. This means that the last three known digits of those addresses are the last three digits of the actual address. Knowing this, we can bypass PIE; although we do not know the complete address, we know the last three digits, so we can use existing addresses on the stack and only modify the last two bytes (the last four digits).
Thus, the core idea of bypassing PIE is partial writing (partial address writing).
RELRO#
ReLocation Read-Only, stack address randomization, is a technology used to enhance the protection of binary data segments.