Differences

This shows you the differences between two versions of the page.

--- developersguide:vminternals [2014/11/11 14:52]
127.0.0.1 external edit
+++ developersguide:vminternals [2014/11/28 13:24] (current)
lionelsambuc
@@ Line 1: / Line 1: @@
+====== VM internals ======
+===== General =====
+In order to encapsulate VM functionality, PM is split in process
+management and memory management. The memory management task is called
+VM and implements the memory part of the fork, exec, exit, etc calls,
+by being called synchronously by PM when those calls are done.  This has
+made PM architecture independent.
+A typical interaction between userland, PM and VM looks like this, for the
+''fork()'' call:
+{{ fork.png }}
+===== VM server =====
+VM manages memory (keeping track of used and unused memory, assigning
+memory to processes, freeing it, ..).
+There is a clear split between architecture dependent and independent
+code in VM. For i386 and ARM, there are intel page tables that reside in
+VM's address space. VM maps these into its own address space, by editing
+its own page table, so it can edit them directly.
+==== Data structures ====
+The most important data structures in VM describe the memory a process
+is using in detail. Page tables are written purely from these data
+structures. They are owned and manipulated by functions in region.c. They
+are:
+=== Regions ===
+A region, described by a 'struct vir_region' or region_t, is a contiguous
+range of virtual address space that has a particular type and some
+parameters. Its type determines its properties and behaviour (see
+'memory types' later). It needn't have any real memory instantiated in it
+(yet). At any time, a region is of a fixed size in virtual address space.
+Some types can be resized when requested (typically the brk() call)
+though. Virtual regions have a staring address and a length, both page-aligned.
+Virtual regions have a fixed-sized array of pointers to physical regions
+in them. Every entry represents a page-sized memory block. If non-NULL, that
+block is instantiated and points to a phys_region, describing the physical
+block of memory.
+=== Physical regions ===
+Physical regions, described by a 'struct phys_region,' exist to reference
+physical blocks. Physical blocks describe a physical page of memory. An
+extra level of indirection is needed because it is necessary to reference
+the same page of physical memory more than once, and keep a reference
+count of it to (efficiently) know when a page is referenced 0, once or
+more than once. 'blocks' here can be used interchangeably with pages.
+=== Physical blocks ===
+A physical block, described by a 'struct phys_block,' describes a single
+page of physical memory. It has the address, and a reference count.
+=== Memory types ===
+Each memory type is described by the data structure struct mem_type in
+memtype.h. They are instantiated in mem_*.c source files and declared
+in glo.h (mem_type_*). This is neatly abstracts different behaviour of
+different memory types when it comes to forking, pagefaulting, and so on,
+making the higher level data structures and code to manipulate them quite
+generic.
+=== Cache ===
+The in-VM disk block cache data structures and code to manipulate it
+is contained in cache.c. Each cache block is page-sized and is uniquely
+identified by a (device, device offset) pair. It furthermore has (inode,
+inode offset) as extra information but this is not guaranteed to be
+unique by VM, nor is it guaranteed to be present. the inode number might
+be VMC_NO_INODE, meaning the the disk block isn't part of inode data
+or its inode number isn't known (e.g. because it ended up in the cache
+through a block device and not through a file).
+The block contents are a 'Physical block' pointer, and being in the cache
+counts as a 'reference' in its refcount.
+Blocks are indexed by two hash tables: one, the (device, device offset)
+pair, and two, the (inode, inode offset) pair. A block is only present
+in the 2nd hashtable if it is in an inode at all (inode != VMC_NO_INODE).
+Furthermore cache blocks are on an LRU chain to be used for eviction in
+out-of-memory conditions.
+==== Typical call structure ====
+Calls into VM are received from 3 main sources: userland, PM and the kernel.
+In all cases, a typical flow of control is
+      * Receive message in main.c
+      * Do call-specific work in call-specific file, e.g. mmap.c, cache.c
+      * This manipulates high-level data structures by invoking functions in region.c
+      * This updates the process pagetable by invoking functions in pagetable.c
+An example is mmap, when just used to allocate memory:
+{{ mmap.png }}
+A more complicated example is where mmap is used to map in a file. VM must know
+the corresponding device and inode number, and does a lookup on the FD by calling
+VFS asynchronously to do so:
+{{ mmap-file.png }}
+==== Handling absent memory: pagefaults, memory handling: calls from the kernel ====
+There are two major cases in which memory is needed that can't be used
+directly:
+      * memory in a range that is mapped logically, but not physically (currently that is on-demand anonymous memory)
+      * memory that is mapped physically, but readonly as it's mapped in more than once (shared between processes that have forked), and so can't be written to directly.
+VM makes sure the page is mapped readonly in the second case. There is
+no page table entry in the first case.
+There are two major situations in which either of these cases can arise:
+  * a process uses the memory itself (page fault)
+  * the kernel wants to use that memory
+In both cases the 'call' is generated by the kernel and arrives in VM
+through a 'kernel signal' in the form of a message.
+The kernel must check for these cases whenever it wants to touch memory;
+e.g. in IPC but also in copying memory to/from processes in kernel
+context. If the kernel detects this, it stores this event, notifies VM,
+doesn't reply to the requester yet, and continues its event loop. VM then
+handles the situation (specifically, mapping in a copy of the page, or an
+entirely new page, as the case may be) and sends a message to the kernel.
+Pagefaults are memory-type specific. How a pagefault in anonymous memory
+might look:
+{{ pagefault.png }}
+If a pagefault is in a file-mapped region, the cache is queried for the
+presence of the right block. If it isn't there, a request to VFS will
+have to happen asynchronously for the block to appear in the cache.
+Once VFS indicates the request is complete, the pagefault code is simply
+re-invoked the same way.
+{{ pagefault-file.png }}
+===== Physical / contiguous memory =====
+Many areas in the system, inside and outside the kernel, assume memory
+that is contiguous in the virtual address space is also contiguous in
+physical memory, but this assumption is no longer true. Therefore all
+instances of umap calls in the kernel had to be checked to see
+    * whether an extra lookup had to be done to get the real physical address
+    * whether that code assumes the memory is contiguous physically, and the memory is present even
+Processes that need physically contiguous memory specifically have to
+ask for it. A warning in the kernel is printed if an old umap function
+is called. A new umap segment (VM_D as opposed to D) was added that
+does a physically-contiguous check, but doesn't print a warning (the
+VM_D is meant to indicate that the caller is aware that memory isn't
+automatically contiguous physically, and that if it wants it to be,
+it has made arrangements for that itself, e.g. use alloc_contig()).
+===== Drivers =====
+Drivers have been updated to
+    * Request contiguous memory if necessary (DMA)
+    * Request it below 16MB physical memory (DMA; lance and floppy) or below 1MB physical memory (BIOS driver)

User Tools

Site Tools

Differences

Page Tools