User Tools

Site Tools


developersguide:vfsfsprotocolv2

Stale page

All the changes listed on this page, except for the dynamic updates and failure resilience support, have been merged into the VFS-FS protocol. The VFS-FS protocol documentation has been updated accordingly. Further changes to the protocol are being documented on that page – not here. This page is kept only to preserve the design of dynamic updates and failure resilience support.

The VFS-FS protocol V2

This page presents a new version of The VFS-FS protocol. This is a work in progress, so do not consider it final yet. In the future the changes to the protocol will be incorporated into the VFS-FS protocol page, and this page will be removed.

The known issues of the previous VFS-FS protocol have been solved. Moreover, support for Dynamic Updates and Failure Resilience for an FS has been added (although these features are not actually implemented, yet).

Failure Resilience

In order to provide failure recovery after an FS has crashed, all requests are part of a transaction. A transaction consists of:

  1. VFS sending a request to an FS
  2. the FS replying it has done what was requested, but has not committed it yet
  3. VFS sends a COMMIT request
  4. the FS replies COMMITED

An FS handles only one transaction at a time (VFS puts subsequent transactions on a queue).

A commit by the FS is an atomic operation. While a transaction is not yet committed, the FS stores the result of the request in step 1 in a temporary data structure. That is, it is not really part of the state of the FS, yet. If, after a crash of the FS, it turns out there was a partially executed transaction, the temporary data structure can be ignored in the recovery process as if the request hadn't happened at all.

However, if an FS crashes right after it did commit the changes, but was unable to successfully deliver the COMMITTED message, restarting the transaction could end up in getting wrong results (e.g., consider unlinking a file successfully and then retry unlinking the file; the first time the FS returns OK and the second time it returns ENOENT). To solve this, each transaction has an ID that is encoded in the message using the 'type' field. An FS must record the transaction ID when it commits a transaction, so it can verify whether it has committed the transaction or not when VFS asks for it. If it turns out the request was already committed, it simply replies COMMITED.

Steps 3 and 4 of the transaction protocol can be omitted if a request is idempotent (for example, stat is a read request and can be issued multiple times and get the same result each time). To do this, the FS sets an 'auto-commit' flag in the reply. This flag is encoded in the 'type' field just like the transaction IDs. It is up to the FS to decide whether or not a request is idempotent. VFS will automatically send a COMMIT request when the reply from an FS indicates that the request is non-idempotent.

When an FS crashes, it should have a way to recover a freshly started FS to the state previous to the crash. This can be done by using shared memory regions that remain resident even after the program that created the shared memory regions is no longer executing. After the crash, a new FS maps in the old memory region and possibly fixes errors if necessary. A newly started FS knows it has to recover state from a previous FS (as opposed to mount a new file system), because VFS will send a REQ_RECOVER message.

When a transaction keeps failing a number of times, the communication layer returns EAGAIN, enabling VFS to undo any changes to its internal state and report an error message to the user (program).

The following macros encode the request result (r, signed short), transaction ID (i, unsigned short), and auto-commit flag (f, unsigned short). Note that the transaction ID is actually 15 bits wide (not 16) and can therefore carry values of 0 up to 32767.

#define    TGET_RESULT(t)          ((t >> 16) & 0xFFFF)
#define    TGET_TRNS_ID(t)         ((t >>  1) & 0x7FFF)
#define    TGET_AC(t)              ((t      ) & 0x0001)

#define    TSET_RESULT(t, r)       (t |= (r & 0xFFFF) << 16 )
#define    TSET_TRNS_ID(t, i)      (t |= (i & 0x7FFF) <<  1 )
#define    TSET_AC(t, f)           (t |= (f & 0x0001)       )

Dynamic Updates

A dynamic update of an FS allows the administrator to install and run a new version of an FS without needing to reboot the computer or unmount and mount file systems; a running copy of the FS is replaced by a new version. This is achieved by telling the FS to write its buffers to disk and do an exit by sending a REQ_RESTART message. Subsequently, the new FS is started and it is told reload state from disk by sending it REQ_RELOAD. That is, it reads the inodes from disk which were in cache before the update. This way it restores state.

It is advised to read “Dynamic Updates and Failure Resilience for the Minix File Server” by Thomas Veerman (see link at the bottom) to gain better understanding of the mechanisms behind Dynamic Updates and Failure Resilience.

Protocol messages

This specification reflects the protocol as it should be implemented, not how it is implemented by MFS. In particular, old and deprecated requests are not and should not be included.

The entire VFS-FS protocol is entirely POSIX-oriented. Any deviation from the requirements imposed by POSIX in this specification is unintentional except when mentioned explicitly. For convenience, links to the relevant Open Group function specifications and file access (ATIME), modification (MTIME) and change (CTIME) time update requirements are provided.

The reply codes in this document are advisory and mostly aimed at indicating additional restrictions needed for POSIX compliance. Not all of them may be applicable to every file server, and a file server may send other error codes where appropriate. Errors resulting from protocol validation checks (e.g. EROFS, sys_safecopy.. errors) are not included.

The requests are ordered according to the following rough categorization:

In the tables below, we use the following color coding:

<6% > The field has its name changed.
Value has changed (e.g., new variable type, new spot in a message, different description). When the whole row has this color, it means this row was added to the request.
Nothing has changed.
This field has been dropped (or replaced by a new field).

Mounting and unmounting

REQ_READSUPER

Mount the file system.

Request fields

<16% >REQ_GRANT <6% >m9_l2 <12% >cp_grant_id_t memory grant (READ) for the label of the block device driver to use
REQ_PATH_LEN m9_s2 unsigned short length of the label
REQ_DEV m9_l5 dev_t device number of block device to mount
REQ_READONLY m6_c1 int flag indicating whether the file system is mounted read-only (1 = read-only, 0 = read-write)
REQ_ISROOT m6_c2 int flag indicating whether the file system is the system root file system (1 = yes, 0 = no)
REQ_FLAGS m9_s3 int REQ_RDONLY flag indicates whether the file system is mounted read-only or not (i.e., read and write). REQ_ISROOT flag indicates the file system is the root file system.

Reply fields

<16% >RES_INODE_NR <6% >m9_l1 <12% >ino_t upon success: inode number of the root inode
RES_MODE m9_s2 mode_t upon success: mode of the root inode
RES_FILE_SIZE m9_l2 off_t upon success: file size of the root inode
RES_FILE_SIZE_HI m9_l2 off_t upon success: file size of the root inode (upper 32 bits)
RES_FILE_SIZE_LO m9_l3 off_t upon success: file size of the root inode (lower 32 bits)
RES_DEV m9_l4 uid_t upon success: resulting file device number
RES_UID m9_s4 uid_t upon success: user ID of the root inode
RES_GID m9_s1 gid_t upon success: group ID of the root inode

Reply codes

<16% >EINVAL label too long
EINVAL unable to retrieve endpoint from DS using label
EINVAL opening device driver failed
EINVAL reading superblock failed
OK file system initialized and mounted

Notes

VFS assumes the root inode on the mounted FS is in use and will have a reference count of 1

REQ_UNMOUNT

Unmount the file system.

Request fields

  • none

Reply fields

  • none

Reply codes

<16% >OK file system unmounted

Notes

Analog to how REQ_READSUPER opens the root inode will REQ_UNMOUNT put the root inode. Previously, all inodes had to have a reference count of 0 before issueing this request.

Inode open and close functions

REQ_LOOKUP

Resolve a path string to an inode.

Request fields

<16% >REQ_GRANT2 <6% >m9_l1 <12% >cp_grant_id_t memory grant (READ) of the buffer containing supplemental group data
<16% >REQ_GRANT <6% >m9_l2 <12% >cp_grant_id_t memory grant (READWRITE) of the buffer containing the pathname
REQ_PATH_LEN <6% >m9_s2 int length of the remaining part of the string to resolve
REQ_PATH_SIZE <6% >m9_l5 size_t total size of the buffer
REQ_L_PATH_OFF m9_l2 size_t starting offset of the string to resolve within the buffer
REQ_DIR_INO <6%>m9_l3 ino_t inode number of the starting directory
REQ_ROOT_INO <6%>m9_l4 ino_t inode number of the root directory of the caller, or 0 if not on this file system
REQ_FLAGS <6% >m9_s3 int PATH_RET_SYMLINK (do not resolve a symlink as the last path component), PATH_GET_UCRED (retrieve UID and GIDs from VFS instead of using REQ_UID and REQ_GID, because UID is member of multiple, supplemental, groups), or 0
REQ_UID <6% >m9_s4 uid_t user ID of the caller
REQ_GID <6% >m9_s1 gid_t group ID of the caller
REQ_UCRED_SIZE <6% >m9_s4 size_t total size of ucred structure

Reply fields

<16% >RES_INODE_NR <6% >m9_l1 <12% >ino_t upon success: resulting file inode number
RES_MODE <6% >m9_s2 mode_t upon success: resulting file mode
RES_FILE_SIZE m6_l2 off_t upon success: resulting file size
RES_FILE_SIZE_HI m9_l2 off_t upon success: file size of the root inode (upper 32 bits)
RES_FILE_SIZE_LO m9_l3 off_t upon success: file size of the root inode (lower 32 bits)
RES_DEV m9_l4 dev_t upon success: resulting file device number
RES_UID m9_s4 uid_t upon success: resulting file user ID
RES_GID m9_s1 gid_t upon success: resulting file group ID
RES_INODE_NR m9_l1 ino_t upon EENTERMOUNT: inode number of the mountpoint inode
RES_OFFSET m9_s2 int upon EENTERMOUNT and ELEAVEMOUNT and ESYMLINK: new starting offset of string within buffer
RES_SYMLOOP m9_s3 unsigned short upon EENTERMOUNT and ELEAVEMOUNT and ESYMLINK: number of symbolic links followed

Reply codes

<16% >ENAMETOOLONG provided path length exceeds what file server can handle
ENAMETOOLONG any of the path components is longer than the file system supports
ENOTDIR any of the intermediate path components is not a directory
EACCES the caller has no search access permission on any of the intermediate directories
ENFILE no inodes are available in memory
ELOOP more than SYMLOOP_MAX symlinks were encountered during the lookup
ENAMETOOLONG resulting path to copy back (including terminating '\0') does not fit in provided buffer
ENOENT one of the components does not exist
EENTERMOUNT a mountpoint was encountered
ELEAVEMOUNT “..” is followed from the file system root and the file system root is not the caller root inode
ESYMLINK an absolute symlink was encountered
EINVAL starting inode was a mountpoint and first path component is not “..”
OK inode successfully looked up and opened

Description

REQ_GRANT2 provides a grant to an ucred structure holding user ID and (supplemental) group data that are to be used to check permissions during the lookup.

Notes

VFS assumes the opened inode on the FS is in use and will have a reference count +1 (i.e., 1 if just opened for the first time, x+1 if it was already opened).

REQ_CREATE

Create a regular file.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number of the containing directory for the new file
REQ_MODE <6% >m9_s3 mode_t mode for the file
REQ_UID <6% >m9_s4 uid_t user ID for the file
REQ_GID <6% >m9_s1 gid_t group ID for the file
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) for the last path component
REQ_PATH_LEN <6% >m9_s2 unsigned short length of the last path component

Reply fields

<16% >RES_INODE_NR <6% >m9_l1 <12% >ino_t upon success: inode number of created file
RES_MODE <6% >m9_s2 mode_t upon success: mode of created file
RES_FILE_SIZE m6_l2 off_t upon success: file size of created file
RES_FILE_SIZE_HI m9_l2 off_t upon success: file size of created file (upper 32 bits)
RES_FILE_SIZE_LO m9_l3 off_t upon success: file size of created file (lower 32 bits)
RES_UID <6% >m9_s4 uid_t upon success: user ID of created file
RES_GID <6% >m9_s1 gid_t upon success: group ID of created file
RES_DEV <6% >m9_l4 dev_t upon success: device node index
RES_INODE_INDEX m6_s2 unsigned short upon success: inode index to associate with this inode

Reply codes

<16% >ENAMETOOLONG the last path component is longer than the file system supports
EEXIST a directory entry with that name already exists
ENFILE no inodes are available
ENOSPC no space is left on the device
EFBIG the containing directory can not handle any more entries
OK regular file created and opened

Notes

VFS assumes the created inode on the FS is in use and will have a reference count of 1.

REQ_NEWNODE

Create an open, unlinked file.

Request fields

<16% >REQ_MODE <6% >m9_s3 <12% >mode_t mode for the inode
REQ_DEV <6% >m9_l5 dev_t device number for the inode
REQ_UID <6% >m9_s4 uid_t user ID for the inode
REQ_GID <6% >m9_s1 gid_t group ID for the inode

Reply fields

<16% >RES_INODE_NR <6% >m9_l1 <12% >ino_t upon success: inode number of the resulting inode
RES_MODE <6% >m9_s2 mode_t upon success: mode of the resulting inode
RES_FILE_SIZE m6_l2 off_t upon success: size of the resulting inode
RES_FILE_SIZE_HI m9_l2 off_t upon success: size of the resulting inode (upper 32 bits)
RES_FILE_SIZE_LO m9_l3 off_t upon success: size of the resulting inode (lower 32 bits)
RES_DEV <6% >m9_l4 dev_t upon success: device number of the resulting inode
RES_UID <6% >m9_s4 uid_t upon success: user ID of the resulting inode
RES_GID <6% >m9_s1 gid_t upon success: group ID of the resulting inode

Reply codes

<16% >ENFILE no inodes are available
OK temporary inode created and opened

REQ_PUTNODE

Decrease an open file's reference count.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
REQ_COUNT <6% >m9_l2 ino_t number of references to drop

Reply fields

  • none

Reply codes

<16% >OK reference count decreased

Notes

VFS assumes the inode on the FS:
- is not in use when REQ_COUNT equals exactly the amount of times the inode was opened according to the FS,
- is still in use when REQ_COUNT is less than the amount of times the inode was opened according to the FS (e.g., sometimes VFS will (effectively) set the reference counter to 1 in order to prevent the counter from wrapping).

Inode use functions

REQ_READ

Read from a file.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
<16% >REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (WRITE) to store the resulting data in
<16% >REQ_POS m2_i3 off_t seek position into the open file
<16% >REQ_SEEK_POS_HI m9_l3 off_t seek position into the open file (upper 32 bits)
<16% >REQ_SEEK_POS_LO m9_l4 off_t seek position into the open file (lower 32 bits)
<16% >REQ_NBYTES m9_l5 size_t number of bytes to write
REQ_FD_INODE_INDEX m2_s1 unsigned short inode index associated with this inode

Reply fields

<16% >RES_FD_POS <6% >m2_i1 <12% >off_t upon success: resulting file position
<16% >RES_SEEK_POS_HI <6% >m9_l3 <12% >off_t upon success: resulting file position (upper 32 bits)
<16% >RES_SEEK_POS_LO <6% >m9_l4 <12% >off_t upon success: resulting file position (lower 32 bits)
RES_NBYTES m9_l5 size_t upon success: number of bytes read

Reply codes

<16% >OK results successfully (partially) read, or EOF reached

REQ_WRITE

Write to a file.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
<16% >REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) containing the data to write
<16% >REQ_FD_POS <6% >m2_i3 off_t seek position into the open file
<16% >REQ_SEEK_POS_HI <6% >m9_l3 off_t seek position into the open file (upper 32 bits)
<16% >REQ_SEEK_POS_LO <6% >m9_l4 off_t seek position into the open file (lower 32 bits)
<16% >REQ_NBYTES <6% >m9_l5 size_t number of bytes to write
REQ_FD_INODE_INDEX m2_s1 unsigned short inode index associated with this inode

Reply fields

<16% >RES_FD_POS <6% >m2_i1 <12% >off_t upon success: resulting file position
<16% >RES_SEEK_POS_HI <6% >m9_l3 <12% >off_t upon success: resulting file position (upper 32 bits)
<16% >RES_SEEK_POS_LO <6% >m9_l4 <12% >off_t upon success: resulting file position (lower 32 bits)
RES_NBYTES m9_l5 size_t upon success: number of bytes written

Reply codes

<16% >ENOSPC no space is left on the device
EFBIG the write would make the resulting file size too big
OK results successfully written

REQ_GETDENTS

Retrieve directory entries.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number of the directory
REQ_GRANT m9_l2 cp_grant_id_t memory grant (WRITE) to store resulting struct dirent entries and names in
REQ_MEM_SIZE m9_l5 size_t size of given memory grant
REQ_GDE_POS m2_l1 off_t seek position into the open file
REQ_SEEK_POS_HI m9_l3 off_t file position (upper 32 bits)
REQ_SEEK_POS_LO m9_l4 off_t file position (lower 32 bits)

Reply fields

<16% >RES_GDE_POS_CHANGE <6% >m2_l1 <12% >off_t upon success: the amount by which to adjust the seek position into the file
<16% >RES_SEEK_POS_HI <6% >m9_l3 <12% >off_t upon success: new seek position into the file (upper 32 bits)
<16% >RES_SEEK_POS_LO <6% >m9_l4 <12% >off_t upon success: new seek position into the file (lower 32 bits)
RES_NBYTES m9_l5 size_t upon success: the amount of resulting bytes stored, with 0 for EOF

Reply codes

<16% >ENOENT the given file position is not aligned to the internal data structures (file system specific)
EINVAL the given buffer is too small to store even one entry (including padding)
OK stored zero or more entries in the user's buffer

REQ_FTRUNC

Set size, or free space, of an open file.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
<16% >REQ_TRC_START_HI <6% >m9_l2 <12% >off_t new file size or starting position (inclusi ve) or region to free (upper 32 bits)
<16% >REQ_TRC_START_LO <6% >m9_l3 <12% >off_t new file size or starting position (inclusi ve) or region to free (lower 32 bits)
<16% >REQ_TRC_END_HI <6% >m9_l4 <12% >off_t zero or ending position (exclusive) of region to free (upper 32 bits)
<16% >REQ_TRC_END_LO <6% >m9_l5 <12% >off_t zero or ending position (exclusive) of region to free (lower 32 bits)
REQ_FD_START m2_i2 off_t new file size or starting position (inclusive) of region to free
REQ_FD_END m2_i3 off_t zero or ending position (exclusive) of region to free

Reply fields

  • none

Reply codes

<16% >EINVAL an attempt is made to change the file size of a pipe to anything but zero
EFBIG the resulting file would be too big
OK file size changed and/or holes created

REQ_INHIBREAD

Mark file as target of seek operation.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number

Reply fields

  • none

Reply codes

<16% >OK request processed successfully

Inode metadata retrieval and manipulation

REQ_STAT

Retrieve file status.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (WRITE) to store resulting “struct stat” in

Reply fields

  • none

Reply codes

<16% >OK result stored in buffer

REQ_CHOWN

Change file ownership.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
REQ_UID m6_s1 uid_t user ID of the caller
REQ_GID m6_c1 gid_t group GID of the caller
REQ_UID m9_s4 uid_t new user ID for the file
REQ_GID m9_s1 gid_t new group ID for the file

Reply fields

<16% >RES_MODE <6% >m9_s2 <12% >mode_t upon success: resulting inode mode

Reply codes

<16% >OK ownership changed

REQ_CHMOD

Change file mode.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
REQ_MODE <6% >m9_s3 mode_t new mode for the file
REQ_UID m6_s1 uid_t user ID of the caller
REQ_GID m6_c1 gid_t group ID of the caller

Reply fields

<16% >RES_MODE <6% >m9_s2 <12% >mode_t upon success: resulting inode mode

Reply codes

<16% >OK mode changed
- The caller UID and GID are typically unused.
- While MFS changes the 06777 (octal) part of the mode, other file system may choose to change S_ISVTX as well (07777)

REQ_UTIME

Set file times.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
REQ_ACTIME <6% >m9_l2 time_t new access time
REQ_MODTIME <6% >m9_l3 time_t new modification time

Reply fields

  • none

Reply codes

<16% >OK custom file times set

Directory entry manipulation

REQ_MKDIR

Create a directory.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number of the containing directory for the new file
REQ_MODE <6% >m9_s3 mode_t mode for the directory
REQ_UID <6% >m9_s4 uid_t user ID for the directory
REQ_GID <6% >m9_s1 gid_t group ID for the directory
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) for the last path component
REQ_PATH_LEN <6% >m9_s2 unsigned short length of the last path component

Reply fields

  • none

Reply codes

<16% >ENAMETOOLONG the last path component is longer than the file system supports
EEXIST a directory entry with that name already exists
ENFILE no inodes are available
ENOSPC no space is left on the device
EFBIG the containing directory can not handle any more entries
EMLINK the containing directory has the maximum number of links already
OK directory created

REQ_MKNOD

Create a special file.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number of the containing directory for the new file
REQ_MODE <6% >m9_s3 mode_t mode for the file
REQ_DEV <6% >m9_l5 dev_t device number
REQ_UID <6% >m9_s4 uid_t user ID for the file
REQ_GID <6% >m9_s1 gid_t group ID for the file
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) for the last path component
REQ_PATH_LEN <6% >m9_s2 short length of the last path component

Reply fields

  • none

Reply codes

<16% >ENAMETOOLONG the last path component is longer than the file system supports
EEXIST a directory entry with that name already exists
EINVAL the given file type is invalid or not supported
ENFILE no inodes are available
ENOSPC no space is left on the device
EFBIG the containing directory can not handle any more entries
OK special file created

Create a hard link to a file.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t link file inode number
REQ_DIR_INO <6% >m9_l3 ino_t inode number of the containing directory for the new link
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) for the last path component
REQ_PATH_LEN <6% >m9_s2 unsigned short length of the last path component

Reply fields

  • none

Reply codes

<16% >ENAMETOOLONG the last path component is longer than the file system supports
EEXIST a directory entry with that name already exists
EPERM the linked file is a directory
EMLINK the linked inode has the maximum number of links already
ENOSPC no space is left on the device
EFBIG the containing directory can not handle any more entries
OK new link created

Unlink a file.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number of the containing directory for the file
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) for last path component
REQ_PATH_LEN <6% >m9_s2 unsigned short length of the last path component

Reply fields

  • none

Reply codes

<16% >ENAMETOOLONG the last path component is longer than the file system supports
ENOENT no directory entry with that name exists
EPERM the given name refers to a directory
OK unlinked file

REQ_RMDIR

Remove an empty directory.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number of the containing directory for the file
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) for last path component
REQ_PATH_LEN <6% >m9_s2 unsigned short length of the last path component

Reply fields

  • none

Reply codes

<16% >ENAMETOOLONG the last path component is longer than the file system supports
ENOENT no directory entry with that name exists
ENOTDIR the given name does not refer to a directory.
ENOTEMPTY the given directory is not empty
EINVAL the given directory is “.” or “..”
EBUSY the given directory is the root directory of the file system
OK removed directory

REQ_RENAME

Rename a file or directory.

Request fields

<16% >REQ_REN_OLD_DIR <6% >m9_l3 <12% >ino_t inode number of containing directory for the old file
REQ_REN_NEW_DIR <6% >m9_l4 ino_t inode number of containing directory for the new file
REQ_REN_GRANT_OLD <6% >m9_l2 cp_grant_id_t memory grant (READ) for the old last path component
REQ_REN_LEN_OLD <6% >m9_s1 unsigned short length of the old last path component
REQ_REN_GRANT_NEW <6% >m9_l1 cp_grant_id_t memory grant (READ) for the new last path component
REQ_REN_LEN_NEW <6% >m9_s2 unsigned short length of the new last path component

Reply fields

  • none

Reply codes

<16% >ENAMETOOLONG the last path component of the old or new file is longer than the file system supports
ENOENT the old file does not exist
OK the old and new last path component and containing directory are the same
EBUSY the old file is a mountpoint directory
EINVAL an attempt is made to move a directory to within its own subtree
EINVAL the old or new last path component is “.” or “..”
EMLINK the old file is a directory and the new file doesn't exist but the new containing directory has the maximum number of links
ENOTDIR the old file is a directory and the new file exists but is not a directory
EISDIR the old file is not a directory and the new file exists but is a directory
ENOTEMPTY the new file is a directory but is not empty
EBUSY the new file is the root directory of the file system
ENOSPC no space is left on the device
EFBIG the new containing directory can not handle any more entries
OK file renamed

Create a symbolic link.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number of the containing directory for the new file
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) for the link name's last path component
REQ_PATH_LEN <6% >m9_s2 unsigned short length of the link name's last path component
REQ_GRANT3 <6% >m9_l3 cp_grant_id_t memory grant (READ) for the link target (not including a trailing '\0')
REQ_MEM_SIZE <6% >m9_l5 size_t length of the link target (not including a trailing '\0')
REQ_UID <6% >m9_s4 uid_t user ID for the new symlink
REQ_GID <6% >m9_s1 gid_t group ID for the new symlink

Reply fields

  • none

Reply codes

<16% >ENAMETOOLONG the last path component is longer than the file system supports
EEXIST a directory entry with that name already exists
ENFILE no inodes are available
ENOSPC no space is left on the device
EFBIG the containing directory can not handle any more entries
ENAMETOOLONG the link target contains '\0' bytes
OK symbolic link created

Retrieve symbolic link target.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (WRITE) for buffer to write result to
REQ_MEM_SIZE <6% >m9_l5 size_t size of buffer to write to

Reply fields

<16% >RES_NBYTES <6% >m9_l5 <6% >size_t upon success: number of bytes written

Reply codes

<16% >OK result stored in buffer

Miscellaneous file system operations

REQ_MOUNTPOINT

Mark an inode as mountpoint.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t inode number of file to use as mountpoint

Reply fields

  • none

Reply codes

<16% >EBUSY inode already in use as mountpoint
ENOTDIR given inode is not a directory
OK inode marked as mountpoint

REQ_FSTATFS

Retrieve file system status.

Request fields

<16% >REQ_GRANT <6% >m9_l2 <12% >cp_grant_id_t memory grant (WRITE) to store resulting “struct statfs” in

Reply fields

  • none

Reply codes

<16% >OK result stored in buffer

REQ_SYNC

Write any unwritten data to disk.

Request fields

  • none

Reply fields

  • none

Reply codes

<16% >OK request processed successfully

Block I/O functions

REQ_FLUSH

Flush cached data for an unmounted device.

Request fields

<16% >REQ_DEV <6% >m9_l5 <12% >dev_t device number

Reply fields

  • none

Reply codes

<16% >EBUSY the device is mounted
OK cache flushed and invalidated for this device

REQ_NEW_DRIVER

Set a new driver endpoint for a major device.

Request fields

<16% >REQ_DEV <6% >m9_l5 <12% >dev_t device number
REQ_DRIVER_E <6% >m9_l2 endpoint_t driver endpoint

Reply fields

  • none

Reply codes

<16% >OK request processed successfully

REQ_BREAD

Read from a block device directly.

Request fields

<16% >REQ_DEV2 <6% >m9_l1 <12% >dev_t device number
REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (WRITE) to store the resulting data in
REQ_SEEK_POS_LO <6% >m9_l4 off_t low 32 bits of position
REQ_SEEK_POS_HI <6% >m9_l3 off_t high 32 bits of position
REQ_NBYTES <6% >m9_l5 size_t number of bytes to read

Reply fields

<16% >RES_SEEK_POS_LO <6% >m9_l4 <12% >off_t upon success and failure: low 32 bits of resulting position
<16% >RES_SEEK_POS_HI <6% >m9_l3 <12% >off_t upon success and failure: high 32 bits of resulting position
<16% >RES_NBYTES <6% >m9_l5 <12% >size_t upon success and failure: total number of bytes read

Reply codes

<16% >EIO I/O error reported by the device driver
OK results successfully (partially) read, or EOF reached

REQ_BWRITE

Write to a block device directly.

Request fields

<16% >REQ_DEV2 <6% >m9_l1 <12% >dev_t device number
<16% >REQ_GRANT <6% >m9_l2 cp_grant_id_t memory grant (READ) containing the data to write
<16% >REQ_SEEK_POS_LO <6% >m9_l4 off_t low 32 bits of position
<16% >REQ_SEEK_POS_HI <6% >m9_l3 off_t high 32 bits of position
<16% >REQ_NBYTES <6% >m9_l5 size_t number of bytes to write

Reply fields

<16% >RES_SEEK_POS_LO <6% >m9_l4 <12% >off_t upon success and failure: low 32 bits of resulting position
<16% >RES_SEEK_POS_HI <6% >m9_l3 <12% >off_t upon success and failure: high 32 bits of resulting position
<16% >RES_NBYTES <6% >m9_l5 <12% >size_t upon success and failure: total number of bytes written

Reply codes

<16% >EIO I/O error reported by the device driver
OK results successfully (partially) written, or EOF reached

Transaction functions

REQ_COMMIT

Commit a transaction (part of the transaction protocol).

Request fields

<16% >REQ_ID <6% >m9_s1 <12% >unsigned short Request ID

Reply fields

  • none

Reply codes

<16% >EINVAL Request ID could not be committed (e.g., REQ_ID != (current id - 1))
<16% >COMMITTED Request is committed

Description

This request tells the file server to commit a transaction. When VFS sends this request to an FS as a reply to a reply from an FS that is flagged 'auto-commit' or if it sends this reques more than once while a transaction is already committed, the FS replies COMMITTED.

REQ_RECOVER

Recover state after a crash.

Request fields

<16% >REQ_OLD_E <6% >m9_l3 <12% >endpoint_t Endpoint of crashed (initial) FS

Reply fields

  • none

Reply codes

<16% >EIO Recovery process failed (e.g., due to data corruption)
<16% >OK Recovery process completed successfully

Description

The FS allocates and registers shared memory regions for the inode cache and buffer cache based on the endpoint (e.g., using keys such as <endpoint>_i and <endpoint>_b) with DS, after receiving a mount request. Upon receiving a recovery request, it maps in the inode cache and buffer cache of the crashed FS and runs a recovery procedure. Note that the endpoint is the endpoint of the initial FS, because the key in DS will never change. There are no naming schemes defined; it is up to the FS to pick suitable names.

Dynamic Update functions

REQ_RESTART

Tell the FS it is about to perform a dynamic update, so it can flush dirty data to disk.

Request fields

  • none

Reply fields

  • none

Reply codes

<16% >OK Dirty data is written to disk

Description

The FS does a sync to write the inode table to the block buffer and the block buffer to disk, followed by above reply and an 'exit.'

REQ_RELOAD

Restore buffers by reading a number of inodes from disk, such that the state of the FS is the same as before the update.

Request fields

<16% >REQ_INODE_NR <6% >m9_l1 <12% >ino_t Inode number of file to use as mountpoint
<16% >REQ_GRANT <6% >m9_l2 <12% >cp_grant_id_t Memory grant (READ) containing a list of inodes that the FS has to reopen
<16% >REQ_MEM_SIZE <6% >m9_l5 <12% >size_t Size of the inode list
<16% >REQ_OLD_E <6% >m9_l3 <12% >endpoint_t Endpoint of FS before dynamic update

Reply fields

  • none

Reply codes

<16% >OK Reload completed successfully

Description

The VFS holds a list of inodes of which it thinks the FS has opened. For VFS, that is what the state of the FS looks like. By reading those inodes from disk, state is restored. \\Because the block buffer is stored in a shared memory region, the reloading process is sped up by mapping in that shared memory (using the endpoint to retrieve the shared memory key in DS) and reading the blocks from cache instead of disk. The inode cache must be overwritten.

References

This document is not based on the original VFS-FS protocol documentation by Balazs Gerofi. However, that document may still provide additional insights.

Design and implementation of the MINIX Virtual File system by Balazs Gerofi, August, 2006

For more information on Dynamic Updates and Failure Resilience, see the Master's Thesis by Thomas Veerman.

Dynamic Updates and Failure Resilience for the Minix File Server by Thomas Veerman, May, 2009

developersguide/vfsfsprotocolv2.txt · Last modified: 2014/11/17 13:22 (external edit)