본문 바로가기
강의/Operating Systems

8. Linkers and Dynamic Linking

by 사향낭 2022. 11. 14.
  • When a process is running, what does its memory look like? A collection of regions called sections (or segments). Basic memory layout for Linux and other Unix systems:
    • Code (or "text" in Unix terminology): starts at location 0
    • Data: starts immediately above code, grows upward
    • Stack: starts at highest address, grows downward

  • System components that take part in managing a process's memory:
    • Compiler and assembler:
      • Generate one object file for each source code file containing information for that source file.
      • Information is incomplete, since each source file generally references some things defined in other source files.
    • Linker:
      • Combinees all of the object files for one program into a single executable file.
      • Linker output is complete and self-sufficient.
    • Operating system:
      • Loads executable files into memory.
      • Allows several different processes to share memory at once.
      • Provides facilities for processes to get more memory after they've started running.
    • Run-time library:
      • Works together with OS to provide dynamic allocation routines, such as malloc and free in C.
  • Linkers (or Linkage Editors, Id in Unix, LINK on Windows): combine many seperate pieces of a program, reorganize storage allocation. Typically invoked invisibly by compilers.
  • Three functions of a linker:
    • Combine all the pieces of a program.
    • Figure out a new memory organization so that all the pieces fit together (combine like sections).
    • Touch up addresses so that the program can run under the new memory organization.
  • Result: a runnable program stored in a new object file called an executable.
  • Problems linker must solve:
    • Assembler doesn't know where the things it's assembling will eventually go in memory
      • Assume that each section starts at address zero, let linker re-arrange.
    • Assembler doesn't know addresses of external objects when assembling files separately. E.g. where is printf rountine?
    • Assembler just puts zero in the object file for each unresolved address
  • Each object file consists of:
    • Sections, each holding a distinct kind of information.
      • Typical sections: code ("text") and data.
      • For each section, object file contains size and assumed starting address of the section, plus initial contents, if any
    • Symbol table: name and current location of each procedure or variable (except stack variables)
    • Relocation records: information about addresses referenced in this object file that the linker must adjust once it knows the final memory allocation.
    • Additional information for debugging (e.g. map from line numbers in the source file to location in the code section.)
  • Linker executes in three passes:
    • Pass 1: read in section sizes, compute final memory layout.
    • Pass 2: read in all symbols, create complete symbol table in memory.
    • Pass 3: read in section and relocation information, update addresses, write out new file.

 

Dynamic Linking

  • Originally all programs were linked statically, as described above:
    • Each program complete
    • All references resolved
  • Since late 1980's most systems have supported shared libraries and dynamic linking:
    • For common library packages, only keep a single copy in memory, shared by all processes.
    • Don't know where library is loaded until runtime; must resolve references dynamically, when program runs.
  • One way of implementing dynamic linking: jump table.
    • If any of the files being linked are shared libraries, the linker doesn't actually include the shared library code in the final program. Instead, it includes three things that impelemnt dynamic linking:
      • Jump table: an array in which each entry is a single machine instruction containing an unconditional branch (jump).
        • For each function in a shared library used b the program, there is one entry in the jump table that will jump to the beginning of that function.
      • Shared library metadata: for each shared library used by the program, the names of the functions needed from that library, and corresponding locations in the jump table.
      • Dynamic loader: small library package invoked at startup to fill in the jump table.
    • For relocation records referring to functions in the shared library, the linker substitutes the address of the jump table entry: when the function is invoked, the caller will "call" the jump table entry, which redirects the call to the real function.
    • Initially, all jump table entries jump to zero (unresolved).
    • When the program starts up, the dynamic loader is invoked:
      • It invokes the OS mmap functions to load each shared library into memory.
      • It reads symbol tables from libraries
      • It fills in the jump table with the correct address for each function in a shared library (info is in symbol table).

 

Summary

  • Linux나 Unix system에서 process가 memory에 올라가 있을 때의 형태를 살펴보면 code, data(code 위에, 위로 커진다), stack(아래로 커진다)으로 구성되어 있다. (이들을 section이라 부른다)
  • Compiler와 assembler는 source file에 대한 object file을 생성하지만 모든 정보가 담겨있지 않다.
    (다른 source file에의 reference를 필요로 할 수도 있다.)
  • Linker는 모든 object file을 실행시킬 수 있는 하나의 program으로 묶는다. (그 자체로 완전하다)
  • OS는 실행시킬 수 있는 프로그램을 memory에 올리고, 다른 process들이 동시에 memory에 접근하는 것을 허용하며, process들이 실행되고 난 뒤 더 많은 memory를 제공하려 한다.
  • Run-time library는 malloc이나 free와 같이 OS와 협동하여 dynamic allocation routine을 제공한다.
  • Assembler는 (예를 들어) printf가 사용되었을 때 이 routine이 어디에 존재하는지 모른다. 따라서 linker가 이를 해결해야 한다.
  • 각 object file은 sections(code, data), symbol table (각 절차와 변수의 이름, 현재 위치를 저장), relocation records (object file 안에서 참조된 주소의 마지막 정보 (목적지), linker가 마지막 주소를 알았을 때 이를 반영해야 함), 디버깅을 위한 부가적인 정보 (code section의 위치와 source file의 줄 번호의 mapping 등) 을 가지고 있다.
  • 결국 linker는 object file을 읽고 memory layout을 파악한 뒤, 모든 symbol을 읽어 symbol table을 만들고, section과 relocation information을 읽어 주소를 갱신하고 새로운 파일로 이를 쓴다. (executable file)
  • 1980 이래로 shared library와 dynamic linking이 지원되기 시작하였다. shared library를 통해 process들은 공통으로 사용하는 library를 각자 올릴 필요 없이 이미 memory에 올라간 library를 공유할 수 있다.
  • 당연하게도 runtime 전에 원하는 library가 memory에 올라가 있는지 모르기 때문에 dynamic linking이 필요하다.
  • 이를 위해 일반적으로 jump table을 채우기 위해 처음 불러지는 library package, dynamic loader를 이용한다.
  • Dynamic loader가 실행되면 먼저 memory에 공유 library를 올리기 위해 OS mmap function을 부른다. 그 다음 library들로부터 symbol table을 읽고 공유 library안의 각 function에 알맞은 주소로 jump table을 채운다.

 

어렵다;;

'강의 > Operating Systems' 카테고리의 다른 글

7. Deadlock  (0) 2022.11.13
6. Implementing Locks  (0) 2022.11.13
5. Scheduling  (0) 2022.11.13
4. Locks and Condition Variables  (0) 2022.11.12
3. Concurrency  (0) 2022.11.12

댓글