User Tools

Site Tools


developersguide:vminternals

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

developersguide:vminternals [2014/11/11 14:52]
127.0.0.1 external edit
developersguide:vminternals [2014/11/28 13:24] (current)
lionelsambuc
Line 1: Line 1:
 +
 +====== VM internals ======
 +
 +===== General =====
 +
 +In order to encapsulate VM functionality,​ PM is split in process
 +management and memory management. The memory management task is called
 +VM and implements the memory part of the fork, exec, exit, etc calls,
 +by being called synchronously by PM when those calls are done.  This has
 +made PM architecture independent.
 +
 +A typical interaction between userland, PM and VM looks like this, for the
 +''​fork()''​ call:
 +
 +{{ fork.png }}
 +
 +===== VM server =====
 +
 +VM manages memory (keeping track of used and unused memory, assigning
 +memory to processes, freeing it, ..).
 +
 +There is a clear split between architecture dependent and independent
 +code in VM. For i386 and ARM, there are intel page tables that reside in
 +VM's address space. VM maps these into its own address space, by editing
 +its own page table, so it can edit them directly.
 +
 +==== Data structures ====
 +
 +The most important data structures in VM describe the memory a process
 +is using in detail. Page tables are written purely from these data
 +structures. They are owned and manipulated by functions in region.c. They
 +are:
 +
 +=== Regions ===
 +
 +A region, described by a '​struct vir_region'​ or region_t, is a contiguous
 +range of virtual address space that has a particular type and some
 +parameters. Its type determines its properties and behaviour (see
 +'​memory types' later). It needn'​t have any real memory instantiated in it
 +(yet). At any time, a region is of a fixed size in virtual address space.
 +Some types can be resized when requested (typically the brk() call)
 +though. Virtual regions have a staring address and a length, both page-aligned.
 +
 +Virtual regions have a fixed-sized array of pointers to physical regions
 +in them. Every entry represents a page-sized memory block. If non-NULL, that
 +block is instantiated and points to a phys_region,​ describing the physical
 +block of memory.
 +
 +=== Physical regions ===
 +
 +Physical regions, described by a '​struct phys_region,'​ exist to reference
 +physical blocks. Physical blocks describe a physical page of memory. An
 +extra level of indirection is needed because it is necessary to reference
 +the same page of physical memory more than once, and keep a reference
 +count of it to (efficiently) know when a page is referenced 0, once or
 +more than once. '​blocks'​ here can be used interchangeably with pages.
 +
 +=== Physical blocks ===
 +
 +A physical block, described by a '​struct phys_block,'​ describes a single
 +page of physical memory. It has the address, and a reference count.
 +
 +=== Memory types ===
 +
 +Each memory type is described by the data structure struct mem_type in
 +memtype.h. They are instantiated in mem_*.c source files and declared
 +in glo.h (mem_type_*). This is neatly abstracts different behaviour of
 +different memory types when it comes to forking, pagefaulting,​ and so on,
 +making the higher level data structures and code to manipulate them quite
 +generic.
 +
 +=== Cache ===
 +
 +The in-VM disk block cache data structures and code to manipulate it
 +is contained in cache.c. Each cache block is page-sized and is uniquely
 +identified by a (device, device offset) pair. It furthermore has (inode,
 +inode offset) as extra information but this is not guaranteed to be
 +unique by VM, nor is it guaranteed to be present. the inode number might
 +be VMC_NO_INODE,​ meaning the the disk block isn't part of inode data
 +or its inode number isn't known (e.g. because it ended up in the cache
 +through a block device and not through a file).
 +
 +The block contents are a '​Physical block' pointer, and being in the cache
 +counts as a '​reference'​ in its refcount.
 +
 +Blocks are indexed by two hash tables: one, the (device, device offset)
 +pair, and two, the (inode, inode offset) pair. A block is only present
 +in the 2nd hashtable if it is in an inode at all (inode != VMC_NO_INODE).
 +
 +Furthermore cache blocks are on an LRU chain to be used for eviction in
 +out-of-memory conditions.
 +
 +==== Typical call structure ====
 +
 +Calls into VM are received from 3 main sources: userland, PM and the kernel.
 +In all cases, a typical flow of control is
 +
 +      * Receive message in main.c
 +      * Do call-specific work in call-specific file, e.g. mmap.c, cache.c
 +      * This manipulates high-level data structures by invoking functions in region.c
 +      * This updates the process pagetable by invoking functions in pagetable.c
 +
 +An example is mmap, when just used to allocate memory:
 +
 +{{ mmap.png }}
 +
 +A more complicated example is where mmap is used to map in a file. VM must know
 +the corresponding device and inode number, and does a lookup on the FD by calling
 +VFS asynchronously to do so:
 +
 +{{ mmap-file.png }}
 + 
 +==== Handling absent memory: pagefaults, memory handling: calls from the kernel ====
 +
 +There are two major cases in which memory is needed that can't be used
 +directly:
 +
 +      * memory in a range that is mapped logically, but not physically (currently that is on-demand anonymous memory)
 +      * memory that is mapped physically, but readonly as it's mapped in more than once (shared between processes that have forked), and so can't be written to directly.
 +
 +VM makes sure the page is mapped readonly in the second case. There is
 +no page table entry in the first case.
 +
 +There are two major situations in which either of these cases can arise:
 +
 +  * a process uses the memory itself (page fault)
 +  * the kernel wants to use that memory
 +
 +In both cases the '​call'​ is generated by the kernel and arrives in VM
 +through a '​kernel signal'​ in the form of a message.
 +
 +The kernel must check for these cases whenever it wants to touch memory;
 +e.g. in IPC but also in copying memory to/from processes in kernel
 +context. If the kernel detects this, it stores this event, notifies VM,
 +doesn'​t reply to the requester yet, and continues its event loop. VM then
 +handles the situation (specifically,​ mapping in a copy of the page, or an
 +entirely new page, as the case may be) and sends a message to the kernel.
 +
 +Pagefaults are memory-type specific. How a pagefault in anonymous memory
 +might look:
 +
 +{{ pagefault.png }}
 +
 +If a pagefault is in a file-mapped region, the cache is queried for the
 +presence of the right block. If it isn't there, a request to VFS will
 +have to happen asynchronously for the block to appear in the cache.
 +Once VFS indicates the request is complete, the pagefault code is simply
 +re-invoked the same way.
 +
 +{{ pagefault-file.png }}
 +
 +===== Physical / contiguous memory =====
 +
 +Many areas in the system, inside and outside the kernel, assume memory
 +that is contiguous in the virtual address space is also contiguous in
 +physical memory, but this assumption is no longer true. Therefore all
 +instances of umap calls in the kernel had to be checked to see
 +
 +    * whether an extra lookup had to be done to get the real physical address
 +    * whether that code assumes the memory is contiguous physically, and the memory is present even
 +
 +Processes that need physically contiguous memory specifically have to
 +ask for it. A warning in the kernel is printed if an old umap function
 +is called. A new umap segment (VM_D as opposed to D) was added that
 +does a physically-contiguous check, but doesn'​t print a warning (the
 +VM_D is meant to indicate that the caller is aware that memory isn't
 +automatically contiguous physically, and that if it wants it to be,
 +it has made arrangements for that itself, e.g. use alloc_contig()).
 +
 +===== Drivers =====
 +
 +Drivers have been updated to
 +
 +    * Request contiguous memory if necessary (DMA)
 +    * Request it below 16MB physical memory (DMA; lance and floppy) or below 1MB physical memory (BIOS driver)
  
developersguide/vminternals.txt · Last modified: 2014/11/28 13:24 by lionelsambuc