Shared library Support

This page contains information about the implementation of shared library support on MINIX3.

A git repository is located at github, however this has stalled a bit. Jorn van Engelen is currently (jan 2012) working on another repository, adding support to some servers to handle the PT_INTERP program segment of ELF formatted binaries (see below).

Introduction

Shared libraries are libraries that are linked to the executable binary at run time; this in contrast to static libraries which are linked at compile time. MINIX3 currently only supports static linking. To add support for shared libraries, a change in how compilation is done, but also how an executable is loaded by the operating system are needed. Luckily both Clang and GCC support this type of linking out of the box, mostly through the underlying binutils. Shared libraries, which are contained into shared objects (.so) in the file system, need also to be programmed in special ways, for example having position-independent code (PIC) and being re-entrant; the new C library from NetBSD meets those constraints. The packages from pkgsrc are known to be able to produce shared objects and executables. Thus the most work will need to be done to MINIX3, and little to no work on Clang and GCC.

Three things need to be solved to get shared library support working on MINIX3:

Porting the ld.so runtime linker

The runtime linker (or RTLD) is a piece of code executed before anything else in a executable is called. In its simplest form, the RTLD loads the required shared libraries and resolves the symbols in the executable. It does this by replacing the dummy pointers in the executable with the address of the symbols in shared libraries.

The RTLD is also a shared object, usually located at /libexec/ld-elf.so, or /lib/ld-linux.so in Linux. The location does not matter much, as long as the various pieces, including the compile-time linker (for example GNU's ld) and the binary itself, know where it is located. The compile-time linker inserts this location into the PT_INTERP program segment of the ELF executable. This allows the kernel to load the correct RTLD for the executable.

Status

The RTLD can probably be ported from NetBSD's ld.elf_so; some changes will be requested, for example about the allocation of memory (RTLD usually uses mmap(2) which does not exist in MINIX yet.)

No work has been done yet on the RTLD.

OS support for the PT_INTERP program segment

When execve is called, MINIX3 loads the new .text and .data segments into the virtual address space of the process, replacing the old segments. Then it points the program counter to the entry point and marks the process as runnable again.

However, if the executable is a dynamic executable (i.e. uses shared libraries) then before the process can be made runnable, the RTLD needs to be loaded. The PT_INTERP program section of the ELF formatted executable contains the location of the RTLD. The program counter is pointed to the RTLD. When the process is run, the RTLD will first resolve the symbols in the executable and then call the executable's entry point. (The RTLD can also choose to resolve some symbols at run-time, which means that the pointers are replaced when, and if, called. This is in contrast to resolving the symbols before running main.)

Status

Support for the PT_INTERP program section needs to be added to some servers (PM, VFS and VM). Although the NetBSD source (/usr/src/sys/kern/exec_elf32.c) provides a good starting point, a lot of work needs to be done.

Jorn van Engelen is currently (jan 2012) working on this on github.

Porting utilities from NetBSD

The ldconfig and ldd utilities need to be ported from NetBSD. ldconfig is used to add paths to the paths the RTLD searches for shared libraries. ldd is a utility to retrieve the required shared libraries of an executable.

Status

The utilities can probably be ported from NetBSD without much difficulty.

No work has been done yet on these utilities.

Appendix: IPC for an execve call

The image below shows the communication diagram of a successful execve call. The new executable is assumed to be statically linked.

http://ftp.quzart.com/minix/wiki-images/execve.png

Appendix: MINIX3's virtual to physical address translation

MINIX3 used to use only segmentation for the virtual to physical address translation. With version 3.1.4 came paging support. MINIX is in the process of phasing out segmentation to only use paging in the future. Now we're in the weird situation of having both segmentation as paging for the address translation. Tanenbaum's "Operating Systems: Design and Implementation (3rd edition)" gives a good explanation of how this works in section 4.6.2. It is important to note that this 'combined' translation applies the segment translation first and then the page translation.

Some processers use different instruction and data memories (in contrast to 'common I&D' where the instruction and data share the same address space). Instruction and data memory can even be on different physical chips (instruction often a ROM chip and data a RAM chip). This means that the jp instruction address argument points to an address in the instruction memory (and instructions are only executed from the instruction memory), and that the mov instruction address argument points to and address in the data memory. Important observation: if one where to read a word from the instruction memory at address 0x0, it does not have to be the same as the word at address 0x0 in the data memory!

Intels x86 architecture uses only one memory, however with segmentation x86 is able to mimic different instruction and data memories. Instruction and data memory can each be a segment in x86 (can both be the same segment, or different segments). (The instruction segment is called code segment by intel.) If the instruction segment is different from the data segment, instruction address 0x0 (e.g. jp 0x0) can translate to a different physical address then data address 0x0 (e.g. mov %eax, 0x0). For example instruction address 0x0 would translate to physical address 0x1000 and data address 0x0 would translate to physical address 0x2000 if only segmentation is used. If paging is also enabled in addition to segmentation, 0x1000 and 0x2000 are translated by the paging mechanism to the physical address.

MINIX3 uses segments to set read/write/execute permissions to section's of the virtual address space of a process. The code segment is set to be readable and executable, data segment is set to be readable and writable. To make both instruction address the same as the data segments, both the code and data segment start at virtual address 0x0. However the code segment's length is smaller then the data segment so. The code segment's length just big enough to fit the highest instruction address. Below is a schematic representation of the process's virtual address space and how it relates to the segments:

vaddr: 0              [code][heap]   [stack][mmap]
       [ code segment (r-x)]
       [                data segment (rw-)            ]
       [        rwx        ][           rw-           ]

A process which ran a test ELF executable got the following addresses. The only thing the executable does is print 'Hello, World!' and print the addition of two numbers.

vaddr begin

vaddr end

segment base

text section

0x08048000

0x08070000

-

code segment

0x00000000

0x08070000

0x19000000

data section

0x08070000

0x08074000

-

data segment

0x00000000

0xE6FFF000

0x19000000

The segment base for the code and data segment are the same, meaning that instruction address 0x0 and data address map to the same piece of memory. Virtual address 0x1000 will thus translate to 0x19001000, regardless of being an instruction or data. After this translation, paging is used to map 0x19001000 to the physical address.

What would happen if we were to read data from a virtual address which is out of bounds? E.g.: str = malloc(..); free(str); str[0]='\0'. Lets say malloc(..). returns the address 0x0000BEEF. The program would get the SIGSEGV signal on doing str[0]='\0' because it's trying to write to a bad address. VM will also print the following message:

VM: pagefault: SEGSEGV 146292 bad addr <lin:0x1900BEEF>; err 0x4 nopage read

The address might seem strange at first, but remember that the data segment base is 0x19000000. 0x0000BEEF is added to the data segment base, after which the address 0x1900BEEF goes to the paging side of translating addresses. VM reports that the virtual page which contains the address 0x1900BEEF does not have a page frame (the virtual address does not map to physical address), and thus the program can't write to that address causing a segfault.

MinixWiki: Shlib (last edited 2012-01-12 12:50:04 by JornVanEngelen)