User Tools

Site Tools


releases:3.2.0:developersguide:pipefs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

releases:3.2.0:developersguide:pipefs [2014/11/14 16:26] (current)
lionelsambuc created
Line 1: Line 1:
 +====== The Pipe File System ======
 +
 +This page documents the minix Pipe File System (PipeFS) server. It uses a 'top down' approach, giving the big picture first and then drilling down into the details. This document is current as of revision [[https://​gforge.cs.vu.nl/​gf/​project/​minix/​scmsvn/?​action=browse&​path=%2Ftrunk%2Fsrc%2F&​pathrev=7819|7819]].
 +
 +===== Overview =====
 +
 +The PipeFS server implements [[http://​en.wikipedia.org/​wiki/​Anonymous_pipe|anonymous pipes]] created via the [[http://​www.minix3.org/​manpages/​man2/​pipe.2.html|pipe(2)]] system call as well as [[http://​en.wikipedia.org/​wiki/​Unix_domain_socket|Unix Domain Sockets]] created via the socket(2) and socketpair(2) system calls. The PipeFS source code is located in [[https://​gforge.cs.vu.nl/​gf/​project/​minix/​scmsvn/?​action=browse&​path=%2Ftrunk%2Fsrc%2Fservers%2Fpfs%2F&​pathrev=7819|src/​servers/​pfs]]. Various files from that directory will be referenced throughout this document.
 +
 +===== main (see main.c) =====
 +
 +The pfs server begins by initializing the [[.:​sef|System Event Framework]] (SEF). fresh and restart callbacks are set along with a signal handler. Then control enters the while loop in main(int, char*[]). In the loop messages are received, acted upon, and a reply message is sent. The loop repeats until the signal handler handles a SIGTERM and the file system is not busy (i.e. when there are no inodes in use -- see super.c).
 +
 +==== The main while Loop ====
 +
 +In the while loop inside main(), a message is recieved with a call to get_work(). In get_work(), the server blocks on sef_receive() until a message is recieved. If the message isn't from the VFS, it is ignored and sef_receive() is called again and again until a VFS message arrives.
 +
 +The message'​s m_type is used as an index into an array of function pointers. If the message is a file system requested, then the fs_call_vec array is used. If it is a character device request, then the dev_call_vec array is used. The arrays are defined in table.c and provide callbacks for several operations. The functions are called with pointers to the incoming request message and the outgoing response message.
 +
 +The index into the arrays is validated to ensure that it is between 0 and NREQS-1. If it is out of range, the error field in the reply message is set to EINVAL, otherwise the function is invoked.
 +
 +A reply message containing the result of the operation is sent to the source of the original message with a call to reply(). reply() simply calls send() and prints an error if it fails.
 +
 +===== fs_call_vec (see table.c) =====
 +
 +fs_call_vec has pointers to the following functions: no_sys, fs_newnode, fs_putnode, fs_ftrunc, fs_stat, fs_sync, and fs_readwrite. One of those functions is called for every valid file system message received from the VFS. The functions implement pipes as a file system.
 +
 +==== no_sys (see utility.c) ====
 +
 +When a message is sent to the pfs server and the m_type field in the incoming message doesn'​t index one of the file system or device operations defined for pfs but is still within the range of valid values (0 to NREQS-1), the no_sys() function is invoked. It prints an invalid call message and returns EINVAL.
 +
 +==== fs_newnode (see open.c) ====
 +
 +This function calls alloc_inode() in inode.c to allocate an inode with 0 links using the mode and device specified in the incoming message. The incoming message is created in the do_pipe() function in pipe.c in the VFS server. Since PipeFS is a '​virtual'​ file system, it uses the NO_DEV device. Only PipeFS will get this type of request.
 +
 +==== fs_putnode (see inode.c) ====
 +
 +This function decreases the reference count for a given inode by the amount requested in the incoming message. The function calls find_inode() with the inode number (REQ_INODE_NR) from the incoming message to locate the desired inode. The value to decrease the reference count by from the incoming message is then validated to ensure that it is a positive number not more than the current reference count. The inode'​s reference count is decreased by one less than the requested amount. Why one less? Because immediately after put_inode() is called. put_inode()'​s purpose is to free up an inode when it is no longer in use. In the function, the reference count is decreased by one before checking if it is 0. If the inode'​s reference count becomes 0, then put_block() in buffer.c gets called to decrease the block'​s reference count as well.
 +
 +==== fs_ftrunc (see link.c) ====
 +
 +This function truncates an inode. The function calls find_inode() with the inode number (REQ_INODE_NR) from the incoming message to locate the desired inode. Then it calls truncate_inode() which will only succeed if the new size is 0, otherwise EINVAL is returned.
 +
 +==== fs_stat (see stadir.c) ====
 +
 +This function queries the status of an inode. ​ The function calls find_inode() with the inode number (REQ_INODE_NR) from the incoming message to locate the desired inode. This it calls get_inode to mark the inode in use (increment the reference counter). stat_inode() is called to gather the status information. The status information is then copied to the memory grant specified by REQ_GRANT in the incoming message. Finally put_inode() is called to release the inode (decrement the reference counter).
 +
 +==== fs_sync (see misc.c) ====
 +
 +There is nothing for PipeFS to do for the sync() system call since PipeFS isn't backed by a physical device, so this function simply returns OK.
 +
 +==== fs_readwrite (see read.c) ====
 +
 +This function is used for both reading and writing. It knows which operation to perform based on the m_type in the incoming message. The function calls find_inode() with the inode number (REQ_INODE_NR) from the incoming message to locate the desired inode. A sanity check is done to ensure that a write won't exceed the size of the pipe buffer. Then get_inode() and get_block() are called to mark them in use (i.e. increment their reference counters). Then bytes are copied between the block buffer and user space using sys_safecopyto() (for reading) or sys_safecopyfrom() (for writing). Then the position is updated and flags are set to update the inode'​s ATIME, CTIME, and/or MTIME. Finally put_inode() and put_block() are called to decrement their reference counters.
 +
 +===== block Management (see buffer.c and buf.h) =====
 +
 +A '​block'​ in PipeFS is an instance of struct buf. It holds the data for a pipe. A block is created by new_block() and added to a doubly-linked list. Instead of head and tail, the names of the variables are front and rear. rear points to the most recently created block and front points to the least recently created block. A helper function get_block() can search the list for the block that corresponds to a given device and inode. If the block isn't found by get_block(),​ new_block() is called to create one. Blocks have reference counters. Calls to get_block() increment the counter, and calls to put_block() decrement the counter. When the reference count hits zero in put_block(),​ put_block() removes the block from the doubly-linked list and free()'​s it.
 +
 +===== inode Management (see inode.c, inode.h, and super.c) =====
 +
 +An '​inode'​ in PipeFS is an instance of struct inode. PipeFS supports 255 inodes (NR_INODES is defined as 256 but the 0th inode is reserved). A bitmap called inodemap is used to keep track of which inode numbers are free and which inode numbers are allocated. alloc_bit() is used to grab a free inode number from the bitmap and free_bit() is used to return an inode number to the free pool when it is no longer needed. The two functions are called from alloc_inode() and free_inode() respectively. There is an inode cache and it is initialized in init_inode_cache(). That function also allocates inode number 0 to prevent it from getting allocated later. Why don't we want it getting allocated later? because 0 is equal to NO_BIT, a constant that signifies that there are no bits available in the bitmap. Similar to blocks, there are get_inode() and put_inode() functions. In addition, there is a find_inode() function that looks in the inode cache for an inode.
 +
 +===== dev_call_vec (see table.c) =====
 +
 +dev_call_vec has points to the following functions: no_sys, uds_cancel, uds_open, uds_close, uds_select, uds_status, uds_read, uds_write, and uds_ioctl. One of those functions is called for every valid device message received from the VFS. The functions implement unix domain sockets as the character device **/​dev/​uds**. The device has style //​STYLE_CLONE//​ and is similar to **/​dev/​tcp** and **/​dev/​udp**. The send and receive buffers for unix domain sockets are just pipes. Each socket has one pipe associated with it (it's read buffer) which it's peers write to.
 +
 +==== uds_open (see dev_uds.c and uds.h) ====
 +
 +This function allocates a file descriptor in uds_fd_table for a new socket, initializes the descriptor, and creates a new inode on PipeFS by invoking fs_newnode(). The search for a free slot in uds_fd_table is linear (0 to NR_FDS-1). Using this method, things are kept simple and the file descriptors end up towards the beginning of the array. This is a good property because when uds_connect() looks for a listening socket to connect to it starts its linear search from 0 too.
 +
 +==== uds_close (see dev_uds.c and uds.h) ====
 +
 +This function disconnects a socket, removes the inode associated with the socket, clears the descriptor information in uds_fd_table,​ and marks the entry in uds_fd_table as UDS_FREE (available for re-use).
 +
 +==== uds_select (see dev_uds.c and uds.h) ====
 +
 +This function implements select(2) which determines if there is data is available to read, if a new peer can be accept()'​d,​ and/or if a write is possible.
 +
 +==== uds_perform_read (see dev_uds.c and uds.h) ====
 +
 +This function checks if a read is possible. If the read is possible, it performs the read. If a read isn't possible (for example, when no data is in the pipe), then the calling process is SUSPENDed. There is a '​pretend'​ parameter that, when set to 1, checks if a read is possible without actually performing the read.
 +
 +==== uds_perform_write (see dev_uds.c and uds.h) ====
 +
 +This function checks if a write is possible. If the write is possible, it performs the write. If a write isn't possible (for example, when the pipe is full), then the calling process is SUSPENDed. There is a '​pretend'​ parameter that, when set to 1, checks if a write is possible without actually performing the write.
 +
 +==== uds_read (see dev_uds.c and uds.h) ====
 +
 +This function calls uds_perform_read() to perform a read on the socket'​s pipe.
 +
 +==== uds_write (see dev_uds.c and uds.h) ====
 +
 +This function calls uds_perform_write() to perform a write on the peer's pipe or in the case of SOCK_DGRAM on the target pipe.
 +
 +==== uds_ioctl (see dev_uds.c, uds.c, and uds.h) ====
 +
 +This function handles ioctl(2) operations on unix domain sockets. Much of the sockets API is implemented using ioctl(2). This function is basically a big case statement that calls functions in uds.c that implement the socket API.
 +
 +  * do_connect() - connect to a listening socket.
 +  * do_accept() - accept an incoming connection.
 +  * do_listen() - set the backlog_size and put the socket into the listening state.
 +  * do_socket() - set the type for this socket (i.e. SOCK_STREAM,​ SOCK_DGRAM, etc).
 +  * do_bind() - set the address for this socket.
 +  * do_getsockname() - get the address for this socket.
 +  * do_getpeername() - get the address for the socket'​s peer.
 +  * do_shutdown() - shutdown a socket for reading, writing, or both.
 +  * do_socketpair() - connect two sockets.
 +  * do_getsockopt_sotype() - get the type of socket (i.e. SOCK_STREAM,​ SOCK_DGRAM, etc).
 +  * do_sendto() - set the target address for sendto(2) calls.
 +  * do_recvfrom() - get the address of the peer who sent the message.
 +  * do_getsockopt_sndbuf() - get the send buffer size.
 +  * do_setsockopt_sndbuf() - set the send buffer size.
 +  * do_getsockopt_rcvbuf() - get the receive buffer size.
 +  * do_setsockopt_rcvbuf() - set the receive buffer size.
 +
 +If the IOCTL command doesn'​t match any cases, then ioctl() will return -1 with errno set to EBADIOCTL.
 +
 +==== uds_status (see dev_uds.c and uds.h) ====
 +
 +This function is used to check on processes that are blocked/​suspended on read(), write(), accept(), connect(), or select(). If the desired operation can be performed, it gets performed and the process is revived.
 +
 +==== uds_cancel (see dev_uds.c and uds.h) ====
 +
 +This function handles cancelled system calls.
 +
 +===== Miscellaneous Functions =====
 +
 +==== update_times (see inode.c) ====
 +
 +This function updates an inode'​s ATIME, CTIME, and/or MTIME based on the flags set in the inode'​s i_update field. It gets the time from clock_time() in utility.c, updates the times that need updating, and then sets i_update to 0. Getting the system time is a relatively expensive operation, so to improve performance this function is only called when the times are actually needed (in fs_stat).
 +
 +===== Additional Resources =====
 +
 +  * [[.:​vfsfsprotocol|VfsFsProtocol]]
 +  * [[http://​www.minix3.org/​doc/​gerofi_thesis.pdf|MINIX VFS]]
  
releases/3.2.0/developersguide/pipefs.txt · Last modified: 2014/11/14 16:26 by lionelsambuc