The VFS-FS protocol V2
This page presents a new version of The VFS-FS protocol. This is a work in progress, so do not consider it final yet. In the future the changes to the protocol will be incorporated into the VFS-FS protocol page, and this page will be removed.
The known issues of the previous VFS-FS protocol have been solved. Moreover, support for Dynamic Updates and Failure Resilience for an FS has been added (although these features are not actually implemented, yet).
Failure Resilience
In order to provide failure recovery after an FS has crashed, all requests are part of a transaction. A transaction consists of:
- VFS sending a request to an FS
the FS replying it has done what was requested, but has not committed it yet
VFS sends a COMMIT request
the FS replies COMMITED
An FS handles only one transaction at a time (VFS puts subsequent transactions on a queue).
A commit by the FS is an atomic operation. While a transaction is not yet committed, the FS stores the result of the request in step 1 in a temporary data structure. That is, it is not really part of the state of the FS, yet. If, after a crash of the FS, it turns out there was a partially executed transaction, the temporary data structure can be ignored in the recovery process as if the request hadn't happened at all.
However, if an FS crashes right after it did commit the changes, but was unable to successfully deliver the COMMITTED message, restarting the transaction could end up in getting wrong results (e.g., consider unlinking a file successfully and then retry unlinking the file; the first time the FS returns OK and the second time it returns ENOENT). To solve this, each transaction has an ID that is encoded in the message using the 'type' field. An FS must record the transaction ID when it commits a transaction, so it can verify whether it has committed the transaction or not when VFS asks for it. If it turns out the request was already committed, it simply replies COMMITED.
Steps 3 and 4 of the transaction protocol can be omitted if a request is idempotent (for example, stat is a read request and can be issued multiple times and get the same result each time). To do this, the FS sets an 'auto-commit' flag in the reply. This flag is encoded in the 'type' field just like the transaction IDs. It is up to the FS to decide whether or not a request is idempotent. VFS will automatically send a COMMIT request when the reply from an FS indicates that the request is non-idempotent.
When an FS crashes, it should have a way to recover a freshly started FS to the state previous to the crash. This can be done by using shared memory regions that remain resident even after the program that created the shared memory regions is no longer executing. After the crash, a new FS maps in the old memory region and possibly fixes errors if necessary. A newly started FS knows it has to recover state from a previous FS (as opposed to mount a new file system), because VFS will send a REQ_RECOVER message.
When a transaction keeps failing a number of times, the communication layer returns EAGAIN, enabling VFS to undo any changes to its internal state and report an error message to the user (program).
The following macros encode the request result (r, signed short), transaction ID (i, unsigned short), and auto-commit flag (f, unsigned short). Note that the transaction ID is actually 15 bits wide (not 16) and can therefore carry values of 0 up to 32767.
#define TGET_RESULT(t) ((t >> 16) & 0xFFFF) #define TGET_TRNS_ID(t) ((t >> 1) & 0x7FFF) #define TGET_AC(t) ((t ) & 0x0001) #define TSET_RESULT(t, r) (t |= (r & 0xFFFF) << 16 ) #define TSET_TRNS_ID(t, i) (t |= (i & 0x7FFF) << 1 ) #define TSET_AC(t, f) (t |= (f & 0x0001) )
Dynamic Updates
A dynamic update of an FS allows the administrator to install and run a new version of an FS without needing to reboot the computer or unmount and mount file systems; a running copy of the FS is replaced by a new version. This is achieved by telling the FS to write its buffers to disk and do an exit by sending a REQ_RESTART message. Subsequently, the new FS is started and it is told reload state from disk by sending it REQ_RELOAD. That is, it reads the inodes from disk which were in cache before the update. This way it restores state.
It is advised to read "Dynamic Updates and Failure Resilience for the Minix File Server" by Thomas Veerman (see link at the bottom) to gain better understanding of the mechanisms behind Dynamic Updates and Failure Resilience.
Protocol messages
This specification reflects the protocol as it should be implemented, not how it is implemented by MFS. In particular, old and deprecated requests are not and should not be included.
The entire VFS-FS protocol is entirely POSIX-oriented. Any deviation from the requirements imposed by POSIX in this specification is unintentional except when mentioned explicitly. For convenience, links to the relevant Open Group function specifications and file access (ATIME), modification (MTIME) and change (CTIME) time update requirements are provided.
The reply codes in this document are advisory and mostly aimed at indicating additional restrictions needed for POSIX compliance. Not all of them may be applicable to every file server, and a file server may send other error codes where appropriate. Errors resulting from protocol validation checks (e.g. EROFS, sys_safecopy.. errors) are not included.
The requests are ordered according to the following rough categorization:
REQ_CREATE (see above)
In the tables below, we use the following color coding:
|
The field has its name changed. |
|
Value has changed (e.g., new variable type, new spot in a message, different description). When the whole row has this color, it means this row was added to the request. |
|
Nothing has changed. |
|
This field has been dropped (or replaced by a new field). |
Mounting and unmounting
REQ_READSUPER
Mount the file system.
Request fields
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for the label of the block device driver to use |
REQ_PATH_LEN |
m9_s2 |
unsigned short |
length of the label |
REQ_DEV |
m9_l5 |
dev_t |
device number of block device to mount |
REQ_READONLY |
m6_c1 |
int |
flag indicating whether the file system is mounted read-only (1 = read-only, 0 = read-write) |
REQ_ISROOT |
m6_c2 |
int |
flag indicating whether the file system is the system root file system (1 = yes, 0 = no) |
REQ_FLAGS |
m9_s3 |
int |
REQ_RDONLY flag indicates whether the file system is mounted read-only or not (i.e., read and write). REQ_ISROOT flag indicates the file system is the root file system. |
Reply fields
RES_INODE_NR |
m9_l1 |
ino_t |
upon success: inode number of the root inode |
RES_MODE |
m9_s2 |
mode_t |
upon success: mode of the root inode |
RES_FILE_SIZE |
m9_l2 |
off_t |
upon success: file size of the root inode |
RES_FILE_SIZE_HI |
m9_l2 |
off_t |
upon success: file size of the root inode (upper 32 bits) |
RES_FILE_SIZE_LO |
m9_l3 |
off_t |
upon success: file size of the root inode (lower 32 bits) |
RES_DEV |
m9_l4 |
uid_t |
upon success: resulting file device number |
RES_UID |
m9_s4 |
uid_t |
upon success: user ID of the root inode |
RES_GID |
m9_s1 |
gid_t |
upon success: group ID of the root inode |
Reply codes
EINVAL |
label too long |
EINVAL |
unable to retrieve endpoint from DS using label |
EINVAL |
opening device driver failed |
EINVAL |
reading superblock failed |
OK |
file system initialized and mounted |
Notes
VFS assumes the root inode on the mounted FS is in use and will have a reference count of 1 |
REQ_UNMOUNT
Unmount the file system.
Request fields
none
Reply fields
none
Reply codes
OK |
file system unmounted |
Notes
Analog to how REQ_READSUPER opens the root inode will REQ_UNMOUNT put the root inode. Previously, all inodes had to have a reference count of 0 before issueing this request. |
Inode open and close functions
REQ_LOOKUP
Resolve a path string to an inode.
Request fields
REQ_GRANT2 |
m9_l1 |
cp_grant_id_t |
memory grant (READ) of the buffer containing supplemental group data |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ|WRITE) of the buffer containing the pathname |
REQ_PATH_LEN |
m9_s2 |
int |
length of the remaining part of the string to resolve |
REQ_PATH_SIZE |
m9_l5 |
size_t |
total size of the buffer |
REQ_L_PATH_OFF |
m9_l2 |
size_t |
starting offset of the string to resolve within the buffer |
REQ_DIR_INO |
m9_l3 |
ino_t |
inode number of the starting directory |
REQ_ROOT_INO |
m9_l4 |
ino_t |
inode number of the root directory of the caller, or 0 if not on this file system |
REQ_FLAGS |
m9_s3 |
int |
PATH_RET_SYMLINK (do not resolve a symlink as the last path component), PATH_GET_UCRED (retrieve UID and GIDs from VFS instead of using REQ_UID and REQ_GID, because UID is member of multiple, supplemental, groups), or 0 |
REQ_UID |
m9_s4 |
uid_t |
user ID of the caller |
REQ_GID |
m9_s1 |
gid_t |
group ID of the caller |
REQ_UCRED_SIZE |
m9_s4 |
size_t |
total size of ucred structure |
Reply fields
RES_INODE_NR |
m9_l1 |
ino_t |
upon success: resulting file inode number |
RES_MODE |
m9_s2 |
mode_t |
upon success: resulting file mode |
RES_FILE_SIZE |
m6_l2 |
off_t |
upon success: resulting file size |
RES_FILE_SIZE_HI |
m9_l2 |
off_t |
upon success: file size of the root inode (upper 32 bits) |
RES_FILE_SIZE_LO |
m9_l3 |
off_t |
upon success: file size of the root inode (lower 32 bits) |
RES_DEV |
m9_l4 |
dev_t |
upon success: resulting file device number |
RES_UID |
m9_s4 |
uid_t |
upon success: resulting file user ID |
RES_GID |
m9_s1 |
gid_t |
upon success: resulting file group ID |
RES_INODE_NR |
m9_l1 |
ino_t |
upon EENTERMOUNT: inode number of the mountpoint inode |
RES_OFFSET |
m9_s2 |
int |
upon EENTERMOUNT and ELEAVEMOUNT and ESYMLINK: new starting offset of string within buffer |
RES_SYMLOOP |
m9_s3 |
unsigned short |
upon EENTERMOUNT and ELEAVEMOUNT and ESYMLINK: number of symbolic links followed |
Reply codes
ENAMETOOLONG |
provided path length exceeds what file server can handle |
ENAMETOOLONG |
any of the path components is longer than the file system supports |
ENOTDIR |
any of the intermediate path components is not a directory |
EACCES |
the caller has no search access permission on any of the intermediate directories |
ENFILE |
no inodes are available in memory |
ELOOP |
more than SYMLOOP_MAX symlinks were encountered during the lookup |
ENAMETOOLONG |
resulting path to copy back (including terminating '\0') does not fit in provided buffer |
ENOENT |
one of the components does not exist |
EENTERMOUNT |
a mountpoint was encountered |
ELEAVEMOUNT |
".." is followed from the file system root and the file system root is not the caller root inode |
ESYMLINK |
an absolute symlink was encountered |
EINVAL |
starting inode was a mountpoint and first path component is not ".." |
OK |
inode successfully looked up and opened |
Description
REQ_GRANT2 provides a grant to an ucred structure holding user ID and (supplemental) group data that are to be used to check permissions during the lookup. |
Notes
VFS assumes the opened inode on the FS is in use and will have a reference count +1 (i.e., 1 if just opened for the first time, x+1 if it was already opened). |
REQ_CREATE
Create a regular file.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number of the containing directory for the new file |
REQ_MODE |
m9_s3 |
mode_t |
mode for the file |
REQ_UID |
m9_s4 |
uid_t |
user ID for the file |
REQ_GID |
m9_s1 |
gid_t |
group ID for the file |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for the last path component |
REQ_PATH_LEN |
m9_s2 |
unsigned short |
length of the last path component |
Reply fields
RES_INODE_NR |
m9_l1 |
ino_t |
upon success: inode number of created file |
RES_MODE |
m9_s2 |
mode_t |
upon success: mode of created file |
RES_FILE_SIZE |
m6_l2 |
off_t |
upon success: file size of created file |
RES_FILE_SIZE_HI |
m9_l2 |
off_t |
upon success: file size of created file (upper 32 bits) |
RES_FILE_SIZE_LO |
m9_l3 |
off_t |
upon success: file size of created file (lower 32 bits) |
RES_UID |
m9_s4 |
uid_t |
upon success: user ID of created file |
RES_GID |
m9_s1 |
gid_t |
upon success: group ID of created file |
RES_DEV |
m9_l4 |
dev_t |
upon success: device node index |
RES_INODE_INDEX |
m6_s2 |
unsigned short |
upon success: inode index to associate with this inode |
Reply codes
ENAMETOOLONG |
the last path component is longer than the file system supports |
EEXIST |
a directory entry with that name already exists |
ENFILE |
no inodes are available |
ENOSPC |
no space is left on the device |
EFBIG |
the containing directory can not handle any more entries |
OK |
regular file created and opened |
Notes
VFS assumes the created inode on the FS is in use and will have a reference count of 1. |
REQ_NEWNODE
Create an open, unlinked file.
Request fields
REQ_MODE |
m9_s3 |
mode_t |
mode for the inode |
REQ_DEV |
m9_l5 |
dev_t |
device number for the inode |
REQ_UID |
m9_s4 |
uid_t |
user ID for the inode |
REQ_GID |
m9_s1 |
gid_t |
group ID for the inode |
Reply fields
RES_INODE_NR |
m9_l1 |
ino_t |
upon success: inode number of the resulting inode |
RES_MODE |
m9_s2 |
mode_t |
upon success: mode of the resulting inode |
RES_FILE_SIZE |
m6_l2 |
off_t |
upon success: size of the resulting inode |
RES_FILE_SIZE_HI |
m9_l2 |
off_t |
upon success: size of the resulting inode (upper 32 bits) |
RES_FILE_SIZE_LO |
m9_l3 |
off_t |
upon success: size of the resulting inode (lower 32 bits) |
RES_DEV |
m9_l4 |
dev_t |
upon success: device number of the resulting inode |
RES_UID |
m9_s4 |
uid_t |
upon success: user ID of the resulting inode |
RES_GID |
m9_s1 |
gid_t |
upon success: group ID of the resulting inode |
Reply codes
ENFILE |
no inodes are available |
OK |
temporary inode created and opened |
REQ_PUTNODE
Decrease an open file's reference count.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_COUNT |
m9_l2 |
ino_t |
number of references to drop |
Reply fields
none
Reply codes
OK |
reference count decreased |
Notes
VFS assumes the inode on the FS: |
Inode use functions
REQ_READ
Read from a file.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (WRITE) to store the resulting data in |
REQ_POS |
m2_i3 |
off_t |
seek position into the open file |
REQ_SEEK_POS_HI |
m9_l3 |
off_t |
seek position into the open file (upper 32 bits) |
REQ_SEEK_POS_LO |
m9_l4 |
off_t |
seek position into the open file (lower 32 bits) |
REQ_NBYTES |
m9_l5 |
size_t |
number of bytes to write |
REQ_FD_INODE_INDEX |
m2_s1 |
unsigned short |
inode index associated with this inode |
Reply fields
RES_FD_POS |
m2_i1 |
off_t |
upon success: resulting file position |
RES_SEEK_POS_HI |
m9_l3 |
off_t |
upon success: resulting file position (upper 32 bits) |
RES_SEEK_POS_LO |
m9_l4 |
off_t |
upon success: resulting file position (lower 32 bits) |
RES_NBYTES |
m9_l5 |
size_t |
upon success: number of bytes read |
Reply codes
OK |
results successfully (partially) read, or EOF reached |
REQ_WRITE
Write to a file.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) containing the data to write |
REQ_FD_POS |
m2_i3 |
off_t |
seek position into the open file |
REQ_SEEK_POS_HI |
m9_l3 |
off_t |
seek position into the open file (upper 32 bits) |
REQ_SEEK_POS_LO |
m9_l4 |
off_t |
seek position into the open file (lower 32 bits) |
REQ_NBYTES |
m9_l5 |
size_t |
number of bytes to write |
REQ_FD_INODE_INDEX |
m2_s1 |
unsigned short |
inode index associated with this inode |
Reply fields
RES_FD_POS |
m2_i1 |
off_t |
upon success: resulting file position |
RES_SEEK_POS_HI |
m9_l3 |
off_t |
upon success: resulting file position (upper 32 bits) |
RES_SEEK_POS_LO |
m9_l4 |
off_t |
upon success: resulting file position (lower 32 bits) |
RES_NBYTES |
m9_l5 |
size_t |
upon success: number of bytes written |
Reply codes
ENOSPC |
no space is left on the device |
EFBIG |
the write would make the resulting file size too big |
OK |
results successfully written |
REQ_GETDENTS
Retrieve directory entries.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number of the directory |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (WRITE) to store resulting struct dirent entries and names in |
REQ_MEM_SIZE |
m9_l5 |
size_t |
size of given memory grant |
REQ_GDE_POS |
m2_l1 |
off_t |
seek position into the open file |
REQ_SEEK_POS_HI |
m9_l3 |
off_t |
file position (upper 32 bits) |
REQ_SEEK_POS_LO |
m9_l4 |
off_t |
file position (lower 32 bits) |
Reply fields
RES_GDE_POS_CHANGE |
m2_l1 |
off_t |
upon success: the amount by which to adjust the seek position into the file |
RES_SEEK_POS_HI |
m9_l3 |
off_t |
upon success: new seek position into the file (upper 32 bits) |
RES_SEEK_POS_LO |
m9_l4 |
off_t |
upon success: new seek position into the file (lower 32 bits) |
RES_NBYTES |
m9_l5 |
size_t |
upon success: the amount of resulting bytes stored, with 0 for EOF |
Reply codes
ENOENT |
the given file position is not aligned to the internal data structures (file system specific) |
EINVAL |
the given buffer is too small to store even one entry (including padding) |
OK |
stored zero or more entries in the user's buffer |
REQ_FTRUNC
Set size, or free space, of an open file.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_TRC_START_HI |
m9_l2 |
off_t |
new file size or starting position (inclusi ve) or region to free (upper 32 bits) |
REQ_TRC_START_LO |
m9_l3 |
off_t |
new file size or starting position (inclusi ve) or region to free (lower 32 bits) |
REQ_TRC_END_HI |
m9_l4 |
off_t |
zero or ending position (exclusive) of region to free (upper 32 bits) |
REQ_TRC_END_LO |
m9_l5 |
off_t |
zero or ending position (exclusive) of region to free (lower 32 bits) |
REQ_FD_START |
m2_i2 |
off_t |
new file size or starting position (inclusive) of region to free |
REQ_FD_END |
m2_i3 |
off_t |
zero or ending position (exclusive) of region to free |
Reply fields
none
Reply codes
EINVAL |
an attempt is made to change the file size of a pipe to anything but zero |
EFBIG |
the resulting file would be too big |
OK |
file size changed and/or holes created |
REQ_INHIBREAD
Mark file as target of seek operation.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
Reply fields
none
Reply codes
OK |
request processed successfully |
Inode metadata retrieval and manipulation
REQ_STAT
Retrieve file status.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (WRITE) to store resulting "struct stat" in |
Reply fields
none
Reply codes
OK |
result stored in buffer |
REQ_CHOWN
Change file ownership.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_UID |
m6_s1 |
uid_t |
user ID of the caller |
REQ_GID |
m6_c1 |
gid_t |
group GID of the caller |
REQ_UID |
m9_s4 |
uid_t |
new user ID for the file |
REQ_GID |
m9_s1 |
gid_t |
new group ID for the file |
Reply fields
RES_MODE |
m9_s2 |
mode_t |
upon success: resulting inode mode |
Reply codes
OK |
ownership changed |
REQ_CHMOD
Change file mode.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_MODE |
m9_s3 |
mode_t |
new mode for the file |
REQ_UID |
m6_s1 |
uid_t |
user ID of the caller |
REQ_GID |
m6_c1 |
gid_t |
group ID of the caller |
Reply fields
RES_MODE |
m9_s2 |
mode_t |
upon success: resulting inode mode |
Reply codes
OK |
mode changed |
- The caller UID and GID are typically unused. |
REQ_UTIME
Set file times.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_ACTIME |
m9_l2 |
time_t |
new access time |
REQ_MODTIME |
m9_l3 |
time_t |
new modification time |
Reply fields
none
Reply codes
OK |
custom file times set |
Directory entry manipulation
REQ_MKDIR
Create a directory.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number of the containing directory for the new file |
REQ_MODE |
m9_s3 |
mode_t |
mode for the directory |
REQ_UID |
m9_s4 |
uid_t |
user ID for the directory |
REQ_GID |
m9_s1 |
gid_t |
group ID for the directory |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for the last path component |
REQ_PATH_LEN |
m9_s2 |
unsigned short |
length of the last path component |
Reply fields
none
Reply codes
ENAMETOOLONG |
the last path component is longer than the file system supports |
EEXIST |
a directory entry with that name already exists |
ENFILE |
no inodes are available |
ENOSPC |
no space is left on the device |
EFBIG |
the containing directory can not handle any more entries |
EMLINK |
the containing directory has the maximum number of links already |
OK |
directory created |
REQ_MKNOD
Create a special file.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number of the containing directory for the new file |
REQ_MODE |
m9_s3 |
mode_t |
mode for the file |
REQ_DEV |
m9_l5 |
dev_t |
device number |
REQ_UID |
m9_s4 |
uid_t |
user ID for the file |
REQ_GID |
m9_s1 |
gid_t |
group ID for the file |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for the last path component |
REQ_PATH_LEN |
m9_s2 |
short |
length of the last path component |
Reply fields
none
Reply codes
ENAMETOOLONG |
the last path component is longer than the file system supports |
EEXIST |
a directory entry with that name already exists |
EINVAL |
the given file type is invalid or not supported |
ENFILE |
no inodes are available |
ENOSPC |
no space is left on the device |
EFBIG |
the containing directory can not handle any more entries |
OK |
special file created |
REQ_LINK
Create a hard link to a file.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
link file inode number |
REQ_DIR_INO |
m9_l3 |
ino_t |
inode number of the containing directory for the new link |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for the last path component |
REQ_PATH_LEN |
m9_s2 |
unsigned short |
length of the last path component |
Reply fields
none
Reply codes
ENAMETOOLONG |
the last path component is longer than the file system supports |
EEXIST |
a directory entry with that name already exists |
EPERM |
the linked file is a directory |
EMLINK |
the linked inode has the maximum number of links already |
ENOSPC |
no space is left on the device |
EFBIG |
the containing directory can not handle any more entries |
OK |
new link created |
REQ_UNLINK
Unlink a file.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number of the containing directory for the file |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for last path component |
REQ_PATH_LEN |
m9_s2 |
unsigned short |
length of the last path component |
Reply fields
none
Reply codes
ENAMETOOLONG |
the last path component is longer than the file system supports |
ENOENT |
no directory entry with that name exists |
EPERM |
the given name refers to a directory |
OK |
unlinked file |
REQ_RMDIR
Remove an empty directory.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number of the containing directory for the file |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for last path component |
REQ_PATH_LEN |
m9_s2 |
unsigned short |
length of the last path component |
Reply fields
none
Reply codes
ENAMETOOLONG |
the last path component is longer than the file system supports |
ENOENT |
no directory entry with that name exists |
ENOTDIR |
the given name does not refer to a directory. |
ENOTEMPTY |
the given directory is not empty |
EINVAL |
the given directory is "." or ".." |
EBUSY |
the given directory is the root directory of the file system |
OK |
removed directory |
REQ_RENAME
Rename a file or directory.
Request fields
REQ_REN_OLD_DIR |
m9_l3 |
ino_t |
inode number of containing directory for the old file |
REQ_REN_NEW_DIR |
m9_l4 |
ino_t |
inode number of containing directory for the new file |
REQ_REN_GRANT_OLD |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for the old last path component |
REQ_REN_LEN_OLD |
m9_s1 |
unsigned short |
length of the old last path component |
REQ_REN_GRANT_NEW |
m9_l1 |
cp_grant_id_t |
memory grant (READ) for the new last path component |
REQ_REN_LEN_NEW |
m9_s2 |
unsigned short |
length of the new last path component |
Reply fields
none
Reply codes
ENAMETOOLONG |
the last path component of the old or new file is longer than the file system supports |
ENOENT |
the old file does not exist |
OK |
the old and new last path component and containing directory are the same |
EBUSY |
the old file is a mountpoint directory |
EINVAL |
an attempt is made to move a directory to within its own subtree |
EINVAL |
the old or new last path component is "." or ".." |
EMLINK |
the old file is a directory and the new file doesn't exist but the new containing directory has the maximum number of links |
ENOTDIR |
the old file is a directory and the new file exists but is not a directory |
EISDIR |
the old file is not a directory and the new file exists but is a directory |
ENOTEMPTY |
the new file is a directory but is not empty |
EBUSY |
the new file is the root directory of the file system |
ENOSPC |
no space is left on the device |
EFBIG |
the new containing directory can not handle any more entries |
OK |
file renamed |
REQ_SLINK
Create a symbolic link.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number of the containing directory for the new file |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) for the link name's last path component |
REQ_PATH_LEN |
m9_s2 |
unsigned short |
length of the link name's last path component |
REQ_GRANT3 |
m9_l3 |
cp_grant_id_t |
memory grant (READ) for the link target (not including a trailing '\0') |
REQ_MEM_SIZE |
m9_l5 |
size_t |
length of the link target (not including a trailing '\0') |
REQ_UID |
m9_s4 |
uid_t |
user ID for the new symlink |
REQ_GID |
m9_s1 |
gid_t |
group ID for the new symlink |
Reply fields
none
Reply codes
ENAMETOOLONG |
the last path component is longer than the file system supports |
EEXIST |
a directory entry with that name already exists |
ENFILE |
no inodes are available |
ENOSPC |
no space is left on the device |
EFBIG |
the containing directory can not handle any more entries |
ENAMETOOLONG |
the link target contains '\0' bytes |
OK |
symbolic link created |
REQ_RDLINK
Retrieve symbolic link target.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (WRITE) for buffer to write result to |
REQ_MEM_SIZE |
m9_l5 |
size_t |
size of buffer to write to |
Reply fields
RES_NBYTES |
m9_l5 |
size_t |
upon success: number of bytes written |
Reply codes
OK |
result stored in buffer |
Miscellaneous file system operations
REQ_MOUNTPOINT
Mark an inode as mountpoint.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
inode number of file to use as mountpoint |
Reply fields
none
Reply codes
EBUSY |
inode already in use as mountpoint |
ENOTDIR |
given inode is not a directory |
OK |
inode marked as mountpoint |
REQ_FSTATFS
Retrieve file system status.
Request fields
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (WRITE) to store resulting "struct statfs" in |
Reply fields
none
Reply codes
OK |
result stored in buffer |
REQ_SYNC
Write any unwritten data to disk.
Request fields
none
Reply fields
none
Reply codes
OK |
request processed successfully |
Block I/O functions
REQ_FLUSH
Flush cached data for an unmounted device.
Request fields
REQ_DEV |
m9_l5 |
dev_t |
device number |
Reply fields
none
Reply codes
EBUSY |
the device is mounted |
OK |
cache flushed and invalidated for this device |
REQ_NEW_DRIVER
Set a new driver endpoint for a major device.
Request fields
REQ_DEV |
m9_l5 |
dev_t |
device number |
REQ_DRIVER_E |
m9_l2 |
endpoint_t |
driver endpoint |
Reply fields
none
Reply codes
OK |
request processed successfully |
REQ_BREAD
Read from a block device directly.
Request fields
REQ_DEV2 |
m9_l1 |
dev_t |
device number |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (WRITE) to store the resulting data in |
REQ_SEEK_POS_LO |
m9_l4 |
off_t |
low 32 bits of position |
REQ_SEEK_POS_HI |
m9_l3 |
off_t |
high 32 bits of position |
REQ_NBYTES |
m9_l5 |
size_t |
number of bytes to read |
Reply fields
RES_SEEK_POS_LO |
m9_l4 |
off_t |
upon success and failure: low 32 bits of resulting position |
RES_SEEK_POS_HI |
m9_l3 |
off_t |
upon success and failure: high 32 bits of resulting position |
RES_NBYTES |
m9_l5 |
size_t |
upon success and failure: total number of bytes read |
Reply codes
EIO |
I/O error reported by the device driver |
OK |
results successfully (partially) read, or EOF reached |
REQ_BWRITE
Write to a block device directly.
Request fields
REQ_DEV2 |
m9_l1 |
dev_t |
device number |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
memory grant (READ) containing the data to write |
REQ_SEEK_POS_LO |
m9_l4 |
off_t |
low 32 bits of position |
REQ_SEEK_POS_HI |
m9_l3 |
off_t |
high 32 bits of position |
REQ_NBYTES |
m9_l5 |
size_t |
number of bytes to write |
Reply fields
RES_SEEK_POS_LO |
m9_l4 |
off_t |
upon success and failure: low 32 bits of resulting position |
RES_SEEK_POS_HI |
m9_l3 |
off_t |
upon success and failure: high 32 bits of resulting position |
RES_NBYTES |
m9_l5 |
size_t |
upon success and failure: total number of bytes written |
Reply codes
EIO |
I/O error reported by the device driver |
OK |
results successfully (partially) written, or EOF reached |
Transaction functions
REQ_COMMIT
Commit a transaction (part of the transaction protocol).
Request fields
REQ_ID |
m9_s1 |
unsigned short |
Request ID |
Reply fields
none
Reply codes
EINVAL |
Request ID could not be committed (e.g., REQ_ID != (current id - 1)) |
COMMITTED |
Request is committed |
Description
This request tells the file server to commit a transaction. When VFS sends this request to an FS as a reply to a reply from an FS that is flagged 'auto-commit' or if it sends this reques more than once while a transaction is already committed, the FS replies COMMITTED. |
REQ_RECOVER
Recover state after a crash.
Request fields
REQ_OLD_E |
m9_l3 |
endpoint_t |
Endpoint of crashed (initial) FS |
Reply fields
none
Reply codes
EIO |
Recovery process failed (e.g., due to data corruption) |
OK |
Recovery process completed successfully |
Description
The FS allocates and registers shared memory regions for the inode cache and buffer cache based on the endpoint (e.g., using keys such as <endpoint>_i and <endpoint>_b) with DS, after receiving a mount request. Upon receiving a recovery request, it maps in the inode cache and buffer cache of the crashed FS and runs a recovery procedure. Note that the endpoint is the endpoint of the initial FS, because the key in DS will never change. There are no naming schemes defined; it is up to the FS to pick suitable names. |
Dynamic Update functions
REQ_RESTART
Tell the FS it is about to perform a dynamic update, so it can flush dirty data to disk.
Request fields
none
Reply fields
none
Reply codes
OK |
Dirty data is written to disk |
Description
The FS does a sync to write the inode table to the block buffer and the block buffer to disk, followed by above reply and an 'exit.' |
REQ_RELOAD
Restore buffers by reading a number of inodes from disk, such that the state of the FS is the same as before the update.
Request fields
REQ_INODE_NR |
m9_l1 |
ino_t |
Inode number of file to use as mountpoint |
REQ_GRANT |
m9_l2 |
cp_grant_id_t |
Memory grant (READ) containing a list of inodes that the FS has to reopen |
REQ_MEM_SIZE |
m9_l5 |
size_t |
Size of the inode list |
REQ_OLD_E |
m9_l3 |
endpoint_t |
Endpoint of FS before dynamic update |
Reply fields
none
Reply codes
OK |
Reload completed successfully |
Description
The VFS holds a list of inodes of which it thinks the FS has opened. For VFS, that is what the state of the FS looks like. By reading those inodes from disk, state is restored. |
References
This document is not based on the original VFS-FS protocol documentation by Balazs Gerofi. However, that document may still provide additional insights.
Design and implementation of the MINIX Virtual File system by Balazs Gerofi, August, 2006
For more information on Dynamic Updates and Failure Resilience, see the Master's Thesis by Thomas Veerman.
Dynamic Updates and Failure Resilience for the Minix File Server by Thomas Veerman, May, 2009