PLT and GOT Workflow in Dynamic Linking
PLT and GOT Workflow in Dynamic Linking
About This Document
In article "Static Linking and Dynamic Linking", the advantages and disadvantages of the two linking modes are introduced. One of the disadvantages of dynamic linking is that the program execution is slightly slower than that in static linking.
The reason is that an executable program in dynamic linking needs to perform indirect jumps using a global offset table (GOT) to access variables and functions across modules. As a result, the program running slows down.
Another reason is that the dynamic linking process is completed when the program is running. That is, before the program execution, the dynamic linker searches for and loads dynamic shared objects (DSOs) required by the program, and then completes symbol relocation, which slows down the program startup.
In this case, a solution called lazy binding emerges. The core idea of lazy binding is that symbol relocation of all function calls across modules is not completed when the program is started. Address binding (symbol lookup and relocation) is performed only when the target program needs to invoke functions of other modules.
To achieve this objective, the ELF file adopts the procedure linkage table (PLT) structure, which contains some exquisite instruction sequences, which are described in the following sections.
General Logic
Before looking into the details of the PLT, we can think about how to complete the task from a top-down perspective. Assume that the target program needs to invoke the foo() function in liba.so. When the function is invoked for the first time, the dynamic linker needs a lookup function for searching for the address of the foo() function to complete the binding.
So what information does this lookup function need? Typically, the module where the binding behavior occurs (for example, in the main module of the target program) and the function to be bound (for example, the foo() function) are required. In glibc, the lookup function is _dl_runtime_resolve(). This process is described in pseudocode as follows:
void DSOFunction@plt()
{
if (DSOFunction@got[index] ! = RELOCATED) {
// If the function is invoked for the first time, the GOT does not contain the address of the function.
The lookup function obtains the address of the invoked function based on the module ID and the ID of the invoked function, and fill it in the corresponding entry in the GOT.
DSOFunction@got[index] = RELOCATED;
}
else{
// The function address already exists in the GOT. Jump to the function address directly.
jmp *DSOFunction@got[index];
}
}
This segment of pseudocode corresponds to an out-of-module PLT entry. After sorting out the pseudocode, we can obtain the contents of the PLT entry in assembly language as follows:
foo@plt
jmp *(foo@got)
push n
push moduleID
jmp _dl_runtime_resolve
The first instruction is to jump to the GOT entry of the foo() function. If the GOT entry has been bound, the correct function address can be directly jumped to. If this function is invoked for the first time, the content in the GOT entry is the address in the second instruction push n. In this step, the if judgment in the pseudocode is implemented.
The second instruction is to push the foo() function ID into the stack. The ID is the subscript of the foo() function in the relocation table. The third instruction is to push the module ID into the stack, and the fourth instruction is to jump to the lookup function _dl_runtime_resolve(). After performing a series of searches, _dl_runtime_resolve() fills the absolute address of the foo() function in the corresponding entry in the GOT, and then transfers the control flow to the foo() function.
After the foo() function address is successfully bound, invoking the PLT entry of the foo() function is achieved by jumping to the correct address through the GOT entry. The above is the general logic of GOT and PLT. The following discusses the specific workflow.
Workflow
The ELF file divides a GOT into two parts: .GOT and .GOT.PLT. .GOT is used to store global variables, and .GOT.PLT is used to store reference addresses of functions in DSOs. Note that the PLT is located in the code segment of an executable program and is readable but not writable, and the GOT is located in the data segment of an executable program and is readable and writable. In addition, the first three entries of .GOT.PLT have specific meanings:
- The first entry stores the address of the .dynamic segment, which describes the information about dynamic links of the current module.
- The second entry stores the ID of the current module.
- The third entry stores the address of the _dl_runtime_resolve function.
These are the beauty of the linker. It extracts the common content of each PLT entry to form a reusable public instruction sequence.
Therefore, the assembly code structure of the PLT is as follows:
plt[0]
push *got[1]
push *got[2]
···
foo@plt
jmp *(foo@got)
push n
jmp plt[0]
After understanding the general logic, see the following example. It is the PLT and GOT of a simple helloworld executable file.
Disassembly of section .plt:
00000000004003f0 <.plt>:
4003f0: ff 35 12 0c 20 00 pushq 0x200c12(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
4003f6: ff 25 14 0c 20 00 jmpq *0x200c14(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
4003fc: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400400 <puts@plt>:
400400: ff 25 12 0c 20 00 jmpq *0x200c12(%rip) # 601018 <puts@GLIBC_2.2.5>
400406: 68 00 00 00 00 pushq $0x0
40040b: e9 e0 ff ff ff jmpq 4003f0 <.plt>
0000000000400410 <__libc_start_main@plt>:
400410: ff 25 0a 0c 20 00 jmpq *0x200c0a(%rip) # 601020 <__libc_start_main@GLIBC_2.2.5>
400416: 68 01 00 00 00 pushq $0x1
40041b: e9 d0 ff ff ff jmpq 4003f0 <.plt>
...
Disassembly of section .got:
0000000000600ff8 <.got>:
...
Disassembly of section .got.plt:
0000000000601000 <_GLOBAL_OFFSET_TABLE_>:
...
By referring to the preceding code after disassembly, we can see that the PLT has three entries:
- .plt[0]: Pushes the second entry in the GOT (that is, .GOT.PLT[1]) into the stack and then jumps to .GOT.PLT[2]. The last nop instruction does nothing.
- .plt[1]: Jumps to .GOT.PLT[3], pushes 0x0 into the stack, and then jumps back to .plt[0].
- .plt[2]: Jumps to .GOT.PLT[4], pushes 0x1 into the stack, and then jumps back to .plt[0].
As shown in the code, the logic is the same as that we have analyzed above. After the related functions are executed, the addresses of the invoked functions are filled in the GOT, and the program control flow is transferred to the invoked functions to run the program.
Summary
In short, the PLT and GOT workflow is as follows:
- During compilation, compilation relocation is performed to reference a required function outside the module to the corresponding PLT entry, and the address in the second instruction in the PLT entry is used to initialize the corresponding GOT entry.
- When the program is started, the dynamic linker obtains the permission from the kernel and updates the module ID and address of the _dl_runtime_resolve function in the GOT.
- When a function outside the module is invoked, the program jumps to the corresponding PLT entry, and then jumps to the corresponding GOT entry.
- If the GOT entry is a function address, the function is directly invoked, and the program control flow is transferred to the invoked function.
- If the GOT entry is invoked for the first time, the program jumps to the second instruction of the PLT entry. Then, the function ID and module ID are pushed into the stack and input to _dl_runtime_resolve as parameters. After _dl_runtime_resolve is executed, the address of the invoked function is filled in the corresponding GOT entry, the program control flow is then transferred to the invoked function.
Dynamic linking improves the startup speed of programs through lazy binding and the PLT structure, accelerating executable program running. The above is the general logic and workflow of PLT and GOT. I hope this article is helpful.