PWN Notes: Basics - 言心吾

Assembly Basics#

Mastery Level: Understand assembly, analyze gadgets, step through debugging to understand register states

Reference Article: x86_64 Assembly Part 1: AT&T Assembly Syntax_x86_64 Assembly at&t-CSDN Blog

Registers#

The commonly used x86 CPU registers are 8: EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP

The CPU prioritizes reading and writing registers, then exchanges data through registers, caches, and memory to achieve buffering. Accessing registers by name is the fastest method, hence they are also referred to as zero-level cache.

Access speeds from high to low are: registers > Level 1 cache > Level 2 cache > Level 3 cache > memory > hard disk

General Registers and Their Uses#

The aforementioned 8 registers each have specific purposes. Taking 32-bit CPU as an example, the roles of these registers are summarized in the table below:

Register	Meaning	Purpose	Contains Registers
EAX	Accumulator Register	Commonly used for multiplication, division, and function return values	AX(AH, AL)
EBX	Base Register	Often used as a pointer for memory data, or as a base to access memory	BX(BH, BL)
ECX	Counter Register	Commonly used as a counter in string and loop operations	CX(CH, CL)
EDX	Data Register	Commonly used for multiplication, division, and I/O pointers	DX(DH, DL)
ESI	Source Index Register	Commonly used as a pointer for memory data and source strings	SI
EDI	Destination Index Register	Commonly used as a pointer for memory data and destination strings	DI
ESP	Stack Pointer Register	Only serves as the top pointer of the stack; cannot be used for arithmetic operations and data transfer	SP
EBP	Base Pointer Register	Only serves as a stack pointer, can access any address in the stack, often used to transfer data in ESP, also commonly used as a base to access the stack; cannot be used for arithmetic operations and data transfer	BP

In the above table, each commonly used register has other names following it; rax, eax, ax, ah, al actually represent the same register but include different ranges.

Below is the correspondence for 64-bit registers:

|63..32|31..16|15-8|7-0|
              |AH. |AL.|
              |AX......|
       |EAX............|
|RAX...................|

Instruction Pointer Register#

The Instruction Pointer Register (RIP) contains the logical address of the next instruction to be executed.

Typically, after fetching an instruction, RIP increments to point to the next instruction. In x86_64, RIP increments by an offset of 8 bytes.

However, RIP does not always increment; there are exceptions, such as the call and ret instructions. The call instruction pushes the current content of RIP onto the stack and transfers control to the target function; the ret instruction performs a pop operation, popping the previously pushed 8-byte RIP address back into RIP.

Flag Register (EFLAGS)#

Assembly Language Instructions#

Common assembly instructions: mov, je, jmp, call, add, sub, inc, dec, and, or

Data Transfer Instructions#

Instruction	Name	Example	Remarks
MOV	Transfer	MOV dest, src	Move data from src to dest
PUSH	Push	PUSH src	Push source operand src onto the stack
POP	Pop	POP dest	Pop data from the top of the stack into dest

Arithmetic Operation Instructions#

Instruction	Name	Example	Remarks
ADD	Addition	ADD dest, src	Add src to dest
SUB	Subtraction	SUB dest, src	Subtract src from dest
INC	Increment	INC dest	Increment dest by 1
DEC	Decrement	DEC dest	Decrement dest by 1

Logical Operation Instructions#

Instruction	Name	Example	Remarks
NOT	Negation	NOT dest	Bitwise negation of operand dest
AND	And	AND dest, src	Perform AND operation on dest and src, store in dest
OR	Or	OR dest, src	Perform OR operation on dest and src, store in dest
XOR	Exclusive Or	XOR dest, src	Perform XOR operation on dest and src, store in dest

Loop Control Instructions#

Instruction	Name	Example	Remarks
LOOP	Counting Loop	LOOP label	Decrement ECX by 1, jump to label if ECX is not 0, otherwise execute the statement after LOOP

Transfer Instructions#

Instruction	Name	Example	Remarks
JMP	Unconditional Jump	JMP label	Unconditionally jump to the label position
CALL	Procedure Call	CALL label	Directly call label
JE	Conditional Jump	JE label	Jump to label if zf = 1
JNE	Conditional Jump	JNE label	Jump to label if zf = 0

Differences in Assembly Between Linux and Windows#

The assembly syntax in linux and windows is different. The differences in syntax are not absolutely related to the systems; generally, gcc/g++ compiler is used on linux, while Microsoft's cl, i.e., MSBUILD, is used on windows. Therefore, the different codes arise from the different compilers. The gcc uses AT&T assembly syntax format, while MSBUILD uses Intel assembly syntax format.

Difference	Intel	AT&T
Referencing Register Names	eax	%eax
Operand Assignment Order	mov dest, src	movl src, dest
Prefix for Register and Immediate Instructions	mov ebx, 0xd00d	movl $0xd00d, %ebx
Register Indirect Addressing	[eax]	(%eax)
Data Type Size	Suffix letters added to opcode, “l” for 32-bit, “w” for 16-bit, “b” for 8-bit (mov dx, word ptr [eax])	Prefix with dword ptr, word ptr, byte ptr format (movb %bl %al)

Addressing Modes#

Direct Addressing

Memory Addressing: [ ]

Overflow (Signed & Unsigned & Upward Overflow)#

Insufficient storage bits
Overflow into the sign bit

Integer overflow used in conjunction with other vulnerabilities

In my opinion, a carry in the signed bit represents overflow.

LINUX File Basics#

Protection Levels: 0-3

0 - Kernel

3 - User

Virtual Memory: The address after physical memory is converted by the MMU. The system allocates a segment of virtual memory space to each user process.

Big Endian and Little Endian#

Big Endian: High-order data -> Low-order computer address (more in line with human reading habits)

Little Endian: Low-order data -> Low-order computer address (counterintuitive but more in line with storage logic and operation rules)

Computer outputs strings: from low address to high address

The data storage format in Linux is little-endian, while the ARM architecture is big-endian.

When inputting numbers as strings, be mindful of the format; Linux reads data from low to high, and pwntools can be used for conversion.

File Descriptors#

Each file descriptor corresponds to an open file.

0: stdin
1: stdout
2: stderr

stdin->buf->stdout

For example:

read(0, buf, size)

write(1, buf, size)

Stack#

A stripped-down version of an array, can only be operated at one end.

Data structure: Last In First Out (LIFO), same as function call order

Function execution order: main -> funA -> funB

Function completion order: funB -> funA -> main

Basic operations: push to stack, pop from stack

Function call instruction: call, return instruction: ret

The operating system sets up a stack for each program, and each independent function of the program has its own stack frame.

In Linux, the stack grows from high addresses (top of the stack) to low addresses (bottom of the stack).

Many algorithms, such as DFS, utilize the stack and are implemented recursively.

Calling Convention#

What is Calling Convention#

In the process of function calls, there are two participants: the caller and the callee.

Calling convention specifies how the caller and callee cooperate to implement function calls, including the following details:

Where to store function parameters. Are they stored in registers? Or on the stack? In which registers? At which positions on the stack?
The order of parameter passing. Are parameters pushed onto the stack from left to right, or from right to left?
How return values are passed back to the caller. Are they stored in registers, or elsewhere?
And so on.

So, why do we need a calling convention?

For example, if we write code in assembly language without a unified standard to follow, then A might habitually place parameters on the stack, B might prefer registers, C has another habit, and so on. When A tries to call someone else's code, they must adhere to the other person's conventions. For instance, calling B requires A to place parameters in the registers specified by B; calling C requires yet another approach.

The calling convention is designed to solve these issues. It specifies the details of function calls so that everyone adheres to a common agreement, allowing us to call others' code without needing to make modifications.

Function Call Stack#

Function Call: When a function is called, the program allocates a new stack frame for it on the call stack. The stack frame contains the function's parameters, local variables, return address, and other information.
Parameter Passing: During the function call, parameters are passed to the called function via push operations. These parameters are stored in the stack frame for use within the function.
Execute Function: The called function begins execution, using the parameters and local variables from the stack frame. The execution process of the function may involve complex logic and calculations.
Return Value Handling: When the function execution is complete, the program returns to the code location that called the function. This location is specified by the return address in the stack frame. If the function has a return value, that value will be pushed onto the caller's stack frame.
Stack Frame Destruction: Once the function call is complete, its corresponding stack frame is popped from the call stack and destroyed, releasing the memory resources it occupied.

Specific Function Call Process#

The effect of pop rax:

mov rax, [rsp]; // Pop the top data from the stack into the register

add rsp, 8; // Move the stack pointer down by one unit

push

The effect of push rax:

sub rsp, 8; // Move the stack frame up by one unit

mov [rsp], rax; // Place a register's value at the top of the stack

Immediate jump, does not involve function calls, used for loops, if-else

For example, the effect of call 1234h:

mov rip, 1234h;

call

Function call, requires saving the return address

For example, the effect of call 1234h:

push rip;

mov rip, 1234h;

pop rip;

Example: main calls funB, funB calls funA, step-by-step analysis of stack frame changes:

During the function call process:

Calling Function:
- Push rip onto the stack as the return address. (call)
Called Function:
- Push rbp onto the stack as the base pointer for the current stack frame.
- Assign the value of rsp to rbp, making rbp point to the bottom of the current stack frame.
- Allocate stack space for local variables and temporary data, reducing rsp by the appropriate size.
- Use rsp as a base pointer to access function parameters and local variables.

When the function returns: leave; ret;

Called Function:
- Pop the allocated local variables and temporary data from the stack.
- Restore rsp to its value at the time of the function call.
Calling Function:
- Pop the return address from the stack.
- Update rip to the return address.

Stack Frame Change Diagram:

+----------------------------+
| main function stack frame   |
+----------------------------+
| Return Address              |
| rbp (Base Pointer of main)  |
+----------------------------+
| funB calling function frame  |
+----------------------------+
| Return Address              |
| rbp (Base Pointer of funB)  |
+----------------------------+
| funA called function frame   |
+----------------------------+
| rbp (Base Pointer of funA)  |
| Local Variables              |
+----------------------------+

How to Pass Parameters#

The return value is given to RAX.

The calling convention for x86-64 functions is:

Parameters are passed from left to right to RDI, RSI, RDX, RCX, R8, R9.
If a function has more than 6 parameters, they are pushed onto the stack from right to left.

System Calls#

syscall Instruction#

Used to call system functions, specifying the system call number (which can be found in the 64-bit Linux system call table).

The system call number resides in the RAX register, then set up the parameters and execute syscall.

Example: calling read(0, buf, size)

mov rax, 0;
mov rdi, 0;
mov rsi, buf;
mov rdx, size;
syscall;

ELF File Structure#

ELF File Format#

ELF (Executable and Linkable Format) is the binary executable file format in Linux.

ELF Header#

The command readelf -h can read the ELF file header. The ELF header includes the program's entry point (Entry Point Address), segment information, and section information. The Start of program headers and Start of section headers in the ELF header can locate the positions of the segment table and section table in the file.

Section Header Table#

Use the command readelf -S to read the section information (sections) of the binary ELF file. The program test has a total of 31 sections. Assembly language is written according to sections, such as the .text section and .data section. Assembly code corresponds one-to-one with machine code, and the section information is retained when the assembly program is converted to binary code.

readelf -S test

Program Header Table#

When an ELF program is executed (loaded into memory), the loader creates a process memory image based on the program's segment table. Use the command readelf -l to read the segment information (segments) of the binary ELF file. The program test has a total of 13 segments, and the number of segments is greater than the number of sections, so multiple sections may map to the same segment.

Based on the permissions of the sections: readable and writable sections are mapped into one segment, read-only sections are mapped into another segment, and so on.

readelf -l test

Linking View/Execution View#

Segment and Section are two different perspectives on the same ELF file. This is referred to as different views in ELF.

From the perspective of Section, the ELF file is the Linking View.
From the perspective of Segment, it is the Execution View.
When discussing ELF loading, segment specifically refers to Segment; in other contexts, segment refers to Section.

libc#

glibc: GNU C Library, glibc itself is the C standard library under GNU, which gradually became the standard C library for Linux.

Its suffix is libc.so, and it is essentially an ELF file that can be executed independently. The dynamic link library encountered in pwn challenges is typically the libc.so file.

Almost all programs in Linux depend on libc, so the functions in libc are crucial.

Lazy Binding Mechanism#

Static Compilation vs Dynamic Compilation#

Dynamically compiled executable files require accompanying dynamic link libraries. During execution, they need to call commands from the corresponding dynamic link library. The advantages are that it reduces the size of the executable file and speeds up compilation, saving system resources. The disadvantages are that even very simple programs that only use one or two commands from the library still need to accompany a relatively large link library; if the corresponding runtime library is not installed on other computers, the dynamically compiled executable file cannot run.

Static compilation means that the compiler extracts the necessary parts from the corresponding dynamic link library (.so) during the compilation of the executable file and links them into the executable file, allowing it to run without relying on dynamic link libraries. Thus, the advantages and disadvantages of static compilation complement those of dynamically compiled executable files.

Lazy Binding#

Using lazy binding is based on the premise that under dynamic linking, the modules loaded by the program contain a large number of function calls.

Lazy binding postpones the binding of function addresses until the first call to that function, thus avoiding the dynamic linker from processing a large number of function reference relocations during loading.

The implementation of lazy binding uses two special data structures: the Global Offset Table (GOT) and the Procedure Linkage Table (PLT).

Global Offset Table (GOT)#

The address of the library function is only saved in the GOT table after it is called for the first time.

The global offset table exists as an independent section in the ELF file and contains two types, with the corresponding section names being .got and .got.plt. The .got section stores the addresses of all external variable references; the .got.plt section stores the addresses of all external function references, primarily using the .got.plt table for lazy binding. The basic structure of the .got.plt table is shown in the figure below:

Among them, the first three items of .got.plt store special address references:

GOT[0]: Stores the address of the .dynamic section, which the dynamic linker uses to extract dynamic linking-related information;
GOT[1]: Stores the ID of the current module;
GOT[2]: Stores the address pointing to the dynamic linker’s _dl_runtime_resolve function, which is used to resolve the actual symbol addresses of shared library functions.

Procedure Linkage Table (PLT)#

To implement lazy binding, when calling a function from an external module, the program does not directly jump through the GOT but instead jumps through a specific entry stored in the PLT table. For all external functions, there will be a corresponding entry in the PLT table, where each entry contains 16 bytes of code used to call a specific function. The general structure of the procedure linkage table is as follows:

The procedure linkage table contains not only the PLT entries created specifically for the external functions called by the compiler but also a special entry corresponding to PLT[0], which is used to jump to the dynamic linker for actual symbol resolution and relocation work:

PLT and GOT#

Regardless of how many times the external function is called, the program actually calls the PLT table, which is composed of a series of assembly instructions.

So, one might wonder: why is there a PLT, a transition, instead of going directly to the GOT?

It's like having many relatives; you need to visit them every week, so you write down their addresses in a notebook. When you want to visit, you check the notebook. This notebook is like a PLT table, where each address jumps to the corresponding GOT table address (your relatives' homes).

If one day you find it troublesome to run around, you invite all your relatives to live at your house, and now you only need to visit the corresponding room. The notebook becomes useless, and you throw it away. This is when you directly access the GOT table without the PLT table.

Do you think a notebook takes up less space, or a house full of relatives takes up less space?

This is one reason for the existence of the PLT table: to utilize memory more efficiently.

Another reason is to increase security.

LINUX Security Protection Mechanisms#

Detailed Explanation of GCC Security Compilation Options (NX(DEP), RELRO, PIE(ASLR), CANARY, FORTIFY)_gcc pie-CSDN Blog

CANARY#

Canary is a protective measure against stack overflow attacks. Its basic principle is to copy a random number canary of length 8 bytes with a starting byte of \x00 from memory at fs: 0x28, which will be pushed onto the stack immediately after creating the stack frame (right next to ebp). When an attacker attempts to overwrite ebp or the return address below ebp through a buffer overflow, they will inevitably overwrite the value of the canary; when the program ends, it checks whether the value of CANARY is consistent with the previous one. If not, the program will not continue running, thus preventing buffer overflow attacks.

Bypass Methods:

Modify the canary.
Leak the canary.

Canary Bypass#

Format String Bypass Canary
- Read the value of the canary through a format string.
Canary Brute Force (for programs with fork function)
- The fork function effectively self-replicates; each time a program is copied, the memory layout is the same, so the canary values are also the same. We can brute force it bit by bit; if the program crashes, that bit is incorrect; if the program runs normally, we can proceed to the next bit until we find the correct canary.
Stack Smashing (deliberately trigger canary_ssp leak)
Hijack __stack_chk_fail
- Modify the address of the __stack_chk_fail function in the GOT table. After the stack overflow, execute this function, but since its address has been modified, the program will jump to the address we want to execute.

NX#

Data on the stack has no execution permission (not executable). Once enabled, writable segments such as heap, stack, and bss segments in the program cannot be executed.

Bypass Method:

Use the mprotect function to modify segment permissions, and the nx protection does not affect ROP or GOT table hijacking exploit methods.

PIE and ASLR#

What is ASLR?#

ASLR is a feature option of the Linux operating system that applies when programs (ELF) are loaded into memory for execution. It is a security protection technology against buffer overflow attacks that randomizes the loading address to prevent attackers from directly locating the attack code position, thus preventing overflow attacks.

Enable/Disable ASLR#

Check the current system's ASLR status:

sudo cat /proc/sys/kernel/randomize_va_space

ASLR has three security levels:

0: ASLR is off
1: Randomizes stack base address (stack), shared libraries (.so libraries), mmap base address
2: On top of 1, adds randomization of heap base address (chunk)

What is PIE?#

PIE is a feature option of the gcc compiler that applies during the compilation of programs (ELF). It is a protection technology against fixed addresses for code segments (.text), data segments (.data), and uninitialized global variable segments (.bss). If a program has PIE protection enabled, the loading address changes each time the program is loaded, making it impossible to use tools like ROPgadget to assist in solving problems.

Enable PIE#

Add the parameter -fPIE when compiling with gcc.

Once PIE is enabled, it randomizes the loading addresses of the code segment (.text), initialized data segment (.data), and uninitialized data segment (.bss).

PIE Bypass#

The loading address of a program is generally in memory page units, so the last three digits of the program's base address must be 0. This means that the last three known digits of those addresses are the last three digits of the actual address. Knowing this, we can bypass PIE; although we do not know the complete address, we know the last three digits, so we can use existing addresses on the stack and only modify the last two bytes (the last four digits).

Thus, the core idea of bypassing PIE is partial writing (partial address writing).

RELRO#

ReLocation Read-Only, stack address randomization, is a technology used to enhance the protection of binary data segments.