The VFS-FS protocol V2

This page presents a new version of The VFS-FS protocol. This is a work in progress, so do not consider it final yet. In the future the changes to the protocol will be incorporated into the VFS-FS protocol page, and this page will be removed.

The known issues of the previous VFS-FS protocol have been solved. Moreover, support for Dynamic Updates and Failure Resilience for an FS has been added (although these features are not actually implemented, yet).

Failure Resilience

In order to provide failure recovery after an FS has crashed, all requests are part of a transaction. A transaction consists of:

  1. VFS sending a request to an FS
  2. the FS replying it has done what was requested, but has not committed it yet

  3. VFS sends a COMMIT request

  4. the FS replies COMMITED

An FS handles only one transaction at a time (VFS puts subsequent transactions on a queue).

A commit by the FS is an atomic operation. While a transaction is not yet committed, the FS stores the result of the request in step 1 in a temporary data structure. That is, it is not really part of the state of the FS, yet. If, after a crash of the FS, it turns out there was a partially executed transaction, the temporary data structure can be ignored in the recovery process as if the request hadn't happened at all.

However, if an FS crashes right after it did commit the changes, but was unable to successfully deliver the COMMITTED message, restarting the transaction could end up in getting wrong results (e.g., consider unlinking a file successfully and then retry unlinking the file; the first time the FS returns OK and the second time it returns ENOENT). To solve this, each transaction has an ID that is encoded in the message using the 'type' field. An FS must record the transaction ID when it commits a transaction, so it can verify whether it has committed the transaction or not when VFS asks for it. If it turns out the request was already committed, it simply replies COMMITED.

Steps 3 and 4 of the transaction protocol can be omitted if a request is idempotent (for example, stat is a read request and can be issued multiple times and get the same result each time). To do this, the FS sets an 'auto-commit' flag in the reply. This flag is encoded in the 'type' field just like the transaction IDs. It is up to the FS to decide whether or not a request is idempotent. VFS will automatically send a COMMIT request when the reply from an FS indicates that the request is non-idempotent.

When an FS crashes, it should have a way to recover a freshly started FS to the state previous to the crash. This can be done by using shared memory regions that remain resident even after the program that created the shared memory regions is no longer executing. After the crash, a new FS maps in the old memory region and possibly fixes errors if necessary. A newly started FS knows it has to recover state from a previous FS (as opposed to mount a new file system), because VFS will send a REQ_RECOVER message.

When a transaction keeps failing a number of times, the communication layer returns EAGAIN, enabling VFS to undo any changes to its internal state and report an error message to the user (program).

The following macros encode the request result (r, signed short), transaction ID (i, unsigned short), and auto-commit flag (f, unsigned short). Note that the transaction ID is actually 15 bits wide (not 16) and can therefore carry values of 0 up to 32767.

Dynamic Updates

A dynamic update of an FS allows the administrator to install and run a new version of an FS without needing to reboot the computer or unmount and mount file systems; a running copy of the FS is replaced by a new version. This is achieved by telling the FS to write its buffers to disk and do an exit by sending a REQ_RESTART message. Subsequently, the new FS is started and it is told reload state from disk by sending it REQ_RELOAD. That is, it reads the inodes from disk which were in cache before the update. This way it restores state.

It is advised to read "Dynamic Updates and Failure Resilience for the Minix File Server" by Thomas Veerman (see link at the bottom) to gain better understanding of the mechanisms behind Dynamic Updates and Failure Resilience.

Protocol messages

This specification reflects the protocol as it should be implemented, not how it is implemented by MFS. In particular, old and deprecated requests are not and should not be included.

The entire VFS-FS protocol is entirely POSIX-oriented. Any deviation from the requirements imposed by POSIX in this specification is unintentional except when mentioned explicitly. For convenience, links to the relevant Open Group function specifications and file access (ATIME), modification (MTIME) and change (CTIME) time update requirements are provided.

The reply codes in this document are advisory and mostly aimed at indicating additional restrictions needed for POSIX compliance. Not all of them may be applicable to every file server, and a file server may send other error codes where appropriate. Errors resulting from protocol validation checks (e.g. EROFS, sys_safecopy.. errors) are not included.

The requests are ordered according to the following rough categorization:

In the tables below, we use the following color coding:

The field has its name changed.

Value has changed (e.g., new variable type, new spot in a message, different description). When the whole row has this color, it means this row was added to the request.

Nothing has changed.

This field has been dropped (or replaced by a new field).

Mounting and unmounting

REQ_READSUPER

Mount the file system.

Request fields

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) for the label of the block device driver to use

REQ_PATH_LEN

m9_s2

unsigned short

length of the label

REQ_DEV

m9_l5

dev_t

device number of block device to mount

REQ_READONLY

m6_c1

int

flag indicating whether the file system is mounted read-only (1 = read-only, 0 = read-write)

REQ_ISROOT

m6_c2

int

flag indicating whether the file system is the system root file system (1 = yes, 0 = no)

REQ_FLAGS

m9_s3

int

REQ_RDONLY flag indicates whether the file system is mounted read-only or not (i.e., read and write). REQ_ISROOT flag indicates the file system is the root file system.

Reply fields

RES_INODE_NR

m9_l1

ino_t

upon success: inode number of the root inode

RES_MODE

m9_s2

mode_t

upon success: mode of the root inode

RES_FILE_SIZE

m9_l2

off_t

upon success: file size of the root inode

RES_FILE_SIZE_HI

m9_l2

off_t

upon success: file size of the root inode (upper 32 bits)

RES_FILE_SIZE_LO

m9_l3

off_t

upon success: file size of the root inode (lower 32 bits)

RES_DEV

m9_l4

uid_t

upon success: resulting file device number

RES_UID

m9_s4

uid_t

upon success: user ID of the root inode

RES_GID

m9_s1

gid_t

upon success: group ID of the root inode

Reply codes

EINVAL

label too long

EINVAL

unable to retrieve endpoint from DS using label

EINVAL

opening device driver failed

EINVAL

reading superblock failed

OK

file system initialized and mounted

Notes

VFS assumes the root inode on the mounted FS is in use and will have a reference count of 1


REQ_UNMOUNT

Unmount the file system.

Request fields

Reply fields

Reply codes

OK

file system unmounted

Notes

Analog to how REQ_READSUPER opens the root inode will REQ_UNMOUNT put the root inode. Previously, all inodes had to have a reference count of 0 before issueing this request.


Inode open and close functions

REQ_LOOKUP

Resolve a path string to an inode.

Request fields

REQ_GRANT2

m9_l1

cp_grant_id_t

memory grant (READ) of the buffer containing supplemental group data

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ|WRITE) of the buffer containing the pathname

REQ_PATH_LEN

m9_s2

int

length of the remaining part of the string to resolve

REQ_PATH_SIZE

m9_l5

size_t

total size of the buffer

REQ_L_PATH_OFF

m9_l2

size_t

starting offset of the string to resolve within the buffer

REQ_DIR_INO

m9_l3

ino_t

inode number of the starting directory

REQ_ROOT_INO

m9_l4

ino_t

inode number of the root directory of the caller, or 0 if not on this file system

REQ_FLAGS

m9_s3

int

PATH_RET_SYMLINK (do not resolve a symlink as the last path component), PATH_GET_UCRED (retrieve UID and GIDs from VFS instead of using REQ_UID and REQ_GID, because UID is member of multiple, supplemental, groups), or 0

REQ_UID

m9_s4

uid_t

user ID of the caller

REQ_GID

m9_s1

gid_t

group ID of the caller

REQ_UCRED_SIZE

m9_s4

size_t

total size of ucred structure

Reply fields

RES_INODE_NR

m9_l1

ino_t

upon success: resulting file inode number

RES_MODE

m9_s2

mode_t

upon success: resulting file mode

RES_FILE_SIZE

m6_l2

off_t

upon success: resulting file size

RES_FILE_SIZE_HI

m9_l2

off_t

upon success: file size of the root inode (upper 32 bits)

RES_FILE_SIZE_LO

m9_l3

off_t

upon success: file size of the root inode (lower 32 bits)

RES_DEV

m9_l4

dev_t

upon success: resulting file device number

RES_UID

m9_s4

uid_t

upon success: resulting file user ID

RES_GID

m9_s1

gid_t

upon success: resulting file group ID

RES_INODE_NR

m9_l1

ino_t

upon EENTERMOUNT: inode number of the mountpoint inode

RES_OFFSET

m9_s2

int

upon EENTERMOUNT and ELEAVEMOUNT and ESYMLINK: new starting offset of string within buffer

RES_SYMLOOP

m9_s3

unsigned short

upon EENTERMOUNT and ELEAVEMOUNT and ESYMLINK: number of symbolic links followed

Reply codes

ENAMETOOLONG

provided path length exceeds what file server can handle

ENAMETOOLONG

any of the path components is longer than the file system supports

ENOTDIR

any of the intermediate path components is not a directory

EACCES

the caller has no search access permission on any of the intermediate directories

ENFILE

no inodes are available in memory

ELOOP

more than SYMLOOP_MAX symlinks were encountered during the lookup

ENAMETOOLONG

resulting path to copy back (including terminating '\0') does not fit in provided buffer

ENOENT

one of the components does not exist

EENTERMOUNT

a mountpoint was encountered

ELEAVEMOUNT

".." is followed from the file system root and the file system root is not the caller root inode

ESYMLINK

an absolute symlink was encountered

EINVAL

starting inode was a mountpoint and first path component is not ".."

OK

inode successfully looked up and opened

Description

REQ_GRANT2 provides a grant to an ucred structure holding user ID and (supplemental) group data that are to be used to check permissions during the lookup.

Notes

VFS assumes the opened inode on the FS is in use and will have a reference count +1 (i.e., 1 if just opened for the first time, x+1 if it was already opened).


REQ_CREATE

Create a regular file.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number of the containing directory for the new file

REQ_MODE

m9_s3

mode_t

mode for the file

REQ_UID

m9_s4

uid_t

user ID for the file

REQ_GID

m9_s1

gid_t

group ID for the file

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) for the last path component

REQ_PATH_LEN

m9_s2

unsigned short

length of the last path component

Reply fields

RES_INODE_NR

m9_l1

ino_t

upon success: inode number of created file

RES_MODE

m9_s2

mode_t

upon success: mode of created file

RES_FILE_SIZE

m6_l2

off_t

upon success: file size of created file

RES_FILE_SIZE_HI

m9_l2

off_t

upon success: file size of created file (upper 32 bits)

RES_FILE_SIZE_LO

m9_l3

off_t

upon success: file size of created file (lower 32 bits)

RES_UID

m9_s4

uid_t

upon success: user ID of created file

RES_GID

m9_s1

gid_t

upon success: group ID of created file

RES_DEV

m9_l4

dev_t

upon success: device node index

RES_INODE_INDEX

m6_s2

unsigned short

upon success: inode index to associate with this inode

Reply codes

ENAMETOOLONG

the last path component is longer than the file system supports

EEXIST

a directory entry with that name already exists

ENFILE

no inodes are available

ENOSPC

no space is left on the device

EFBIG

the containing directory can not handle any more entries

OK

regular file created and opened

Notes

VFS assumes the created inode on the FS is in use and will have a reference count of 1.


REQ_NEWNODE

Create an open, unlinked file.

Request fields

REQ_MODE

m9_s3

mode_t

mode for the inode

REQ_DEV

m9_l5

dev_t

device number for the inode

REQ_UID

m9_s4

uid_t

user ID for the inode

REQ_GID

m9_s1

gid_t

group ID for the inode

Reply fields

RES_INODE_NR

m9_l1

ino_t

upon success: inode number of the resulting inode

RES_MODE

m9_s2

mode_t

upon success: mode of the resulting inode

RES_FILE_SIZE

m6_l2

off_t

upon success: size of the resulting inode

RES_FILE_SIZE_HI

m9_l2

off_t

upon success: size of the resulting inode (upper 32 bits)

RES_FILE_SIZE_LO

m9_l3

off_t

upon success: size of the resulting inode (lower 32 bits)

RES_DEV

m9_l4

dev_t

upon success: device number of the resulting inode

RES_UID

m9_s4

uid_t

upon success: user ID of the resulting inode

RES_GID

m9_s1

gid_t

upon success: group ID of the resulting inode

Reply codes

ENFILE

no inodes are available

OK

temporary inode created and opened


REQ_PUTNODE

Decrease an open file's reference count.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_COUNT

m9_l2

ino_t

number of references to drop

Reply fields

Reply codes

OK

reference count decreased

Notes

VFS assumes the inode on the FS:
- is not in use when REQ_COUNT equals exactly the amount of times the inode was opened according to the FS,
- is still in use when REQ_COUNT is less than the amount of times the inode was opened according to the FS (e.g., sometimes VFS will (effectively) set the reference counter to 1 in order to prevent the counter from wrapping).


Inode use functions

REQ_READ

Read from a file.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (WRITE) to store the resulting data in

REQ_POS

m2_i3

off_t

seek position into the open file

REQ_SEEK_POS_HI

m9_l3

off_t

seek position into the open file (upper 32 bits)

REQ_SEEK_POS_LO

m9_l4

off_t

seek position into the open file (lower 32 bits)

REQ_NBYTES

m9_l5

size_t

number of bytes to write

REQ_FD_INODE_INDEX

m2_s1

unsigned short

inode index associated with this inode

Reply fields

RES_FD_POS

m2_i1

off_t

upon success: resulting file position

RES_SEEK_POS_HI

m9_l3

off_t

upon success: resulting file position (upper 32 bits)

RES_SEEK_POS_LO

m9_l4

off_t

upon success: resulting file position (lower 32 bits)

RES_NBYTES

m9_l5

size_t

upon success: number of bytes read

Reply codes

OK

results successfully (partially) read, or EOF reached


REQ_WRITE

Write to a file.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) containing the data to write

REQ_FD_POS

m2_i3

off_t

seek position into the open file

REQ_SEEK_POS_HI

m9_l3

off_t

seek position into the open file (upper 32 bits)

REQ_SEEK_POS_LO

m9_l4

off_t

seek position into the open file (lower 32 bits)

REQ_NBYTES

m9_l5

size_t

number of bytes to write

REQ_FD_INODE_INDEX

m2_s1

unsigned short

inode index associated with this inode

Reply fields

RES_FD_POS

m2_i1

off_t

upon success: resulting file position

RES_SEEK_POS_HI

m9_l3

off_t

upon success: resulting file position (upper 32 bits)

RES_SEEK_POS_LO

m9_l4

off_t

upon success: resulting file position (lower 32 bits)

RES_NBYTES

m9_l5

size_t

upon success: number of bytes written

Reply codes

ENOSPC

no space is left on the device

EFBIG

the write would make the resulting file size too big

OK

results successfully written


REQ_GETDENTS

Retrieve directory entries.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number of the directory

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (WRITE) to store resulting struct dirent entries and names in

REQ_MEM_SIZE

m9_l5

size_t

size of given memory grant

REQ_GDE_POS

m2_l1

off_t

seek position into the open file

REQ_SEEK_POS_HI

m9_l3

off_t

file position (upper 32 bits)

REQ_SEEK_POS_LO

m9_l4

off_t

file position (lower 32 bits)

Reply fields

RES_GDE_POS_CHANGE

m2_l1

off_t

upon success: the amount by which to adjust the seek position into the file

RES_SEEK_POS_HI

m9_l3

off_t

upon success: new seek position into the file (upper 32 bits)

RES_SEEK_POS_LO

m9_l4

off_t

upon success: new seek position into the file (lower 32 bits)

RES_NBYTES

m9_l5

size_t

upon success: the amount of resulting bytes stored, with 0 for EOF

Reply codes

ENOENT

the given file position is not aligned to the internal data structures (file system specific)

EINVAL

the given buffer is too small to store even one entry (including padding)

OK

stored zero or more entries in the user's buffer


REQ_FTRUNC

Set size, or free space, of an open file.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_TRC_START_HI

m9_l2

off_t

new file size or starting position (inclusi ve) or region to free (upper 32 bits)

REQ_TRC_START_LO

m9_l3

off_t

new file size or starting position (inclusi ve) or region to free (lower 32 bits)

REQ_TRC_END_HI

m9_l4

off_t

zero or ending position (exclusive) of region to free (upper 32 bits)

REQ_TRC_END_LO

m9_l5

off_t

zero or ending position (exclusive) of region to free (lower 32 bits)

REQ_FD_START

m2_i2

off_t

new file size or starting position (inclusive) of region to free

REQ_FD_END

m2_i3

off_t

zero or ending position (exclusive) of region to free

Reply fields

Reply codes

EINVAL

an attempt is made to change the file size of a pipe to anything but zero

EFBIG

the resulting file would be too big

OK

file size changed and/or holes created


REQ_INHIBREAD

Mark file as target of seek operation.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

Reply fields

Reply codes

OK

request processed successfully


Inode metadata retrieval and manipulation

REQ_STAT

Retrieve file status.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (WRITE) to store resulting "struct stat" in

Reply fields

Reply codes

OK

result stored in buffer


REQ_CHOWN

Change file ownership.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_UID

m6_s1

uid_t

user ID of the caller

REQ_GID

m6_c1

gid_t

group GID of the caller

REQ_UID

m9_s4

uid_t

new user ID for the file

REQ_GID

m9_s1

gid_t

new group ID for the file

Reply fields

RES_MODE

m9_s2

mode_t

upon success: resulting inode mode

Reply codes

OK

ownership changed


REQ_CHMOD

Change file mode.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_MODE

m9_s3

mode_t

new mode for the file

REQ_UID

m6_s1

uid_t

user ID of the caller

REQ_GID

m6_c1

gid_t

group ID of the caller

Reply fields

RES_MODE

m9_s2

mode_t

upon success: resulting inode mode

Reply codes

OK

mode changed

- The caller UID and GID are typically unused.
- While MFS changes the 06777 (octal) part of the mode, other file system may choose to change S_ISVTX as well (07777)


REQ_UTIME

Set file times.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_ACTIME

m9_l2

time_t

new access time

REQ_MODTIME

m9_l3

time_t

new modification time

Reply fields

Reply codes

OK

custom file times set


Directory entry manipulation

REQ_MKDIR

Create a directory.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number of the containing directory for the new file

REQ_MODE

m9_s3

mode_t

mode for the directory

REQ_UID

m9_s4

uid_t

user ID for the directory

REQ_GID

m9_s1

gid_t

group ID for the directory

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) for the last path component

REQ_PATH_LEN

m9_s2

unsigned short

length of the last path component

Reply fields

Reply codes

ENAMETOOLONG

the last path component is longer than the file system supports

EEXIST

a directory entry with that name already exists

ENFILE

no inodes are available

ENOSPC

no space is left on the device

EFBIG

the containing directory can not handle any more entries

EMLINK

the containing directory has the maximum number of links already

OK

directory created


REQ_MKNOD

Create a special file.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number of the containing directory for the new file

REQ_MODE

m9_s3

mode_t

mode for the file

REQ_DEV

m9_l5

dev_t

device number

REQ_UID

m9_s4

uid_t

user ID for the file

REQ_GID

m9_s1

gid_t

group ID for the file

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) for the last path component

REQ_PATH_LEN

m9_s2

short

length of the last path component

Reply fields

Reply codes

ENAMETOOLONG

the last path component is longer than the file system supports

EEXIST

a directory entry with that name already exists

EINVAL

the given file type is invalid or not supported

ENFILE

no inodes are available

ENOSPC

no space is left on the device

EFBIG

the containing directory can not handle any more entries

OK

special file created


Create a hard link to a file.

Request fields

REQ_INODE_NR

m9_l1

ino_t

link file inode number

REQ_DIR_INO

m9_l3

ino_t

inode number of the containing directory for the new link

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) for the last path component

REQ_PATH_LEN

m9_s2

unsigned short

length of the last path component

Reply fields

Reply codes

ENAMETOOLONG

the last path component is longer than the file system supports

EEXIST

a directory entry with that name already exists

EPERM

the linked file is a directory

EMLINK

the linked inode has the maximum number of links already

ENOSPC

no space is left on the device

EFBIG

the containing directory can not handle any more entries

OK

new link created


Unlink a file.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number of the containing directory for the file

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) for last path component

REQ_PATH_LEN

m9_s2

unsigned short

length of the last path component

Reply fields

Reply codes

ENAMETOOLONG

the last path component is longer than the file system supports

ENOENT

no directory entry with that name exists

EPERM

the given name refers to a directory

OK

unlinked file


REQ_RMDIR

Remove an empty directory.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number of the containing directory for the file

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) for last path component

REQ_PATH_LEN

m9_s2

unsigned short

length of the last path component

Reply fields

Reply codes

ENAMETOOLONG

the last path component is longer than the file system supports

ENOENT

no directory entry with that name exists

ENOTDIR

the given name does not refer to a directory.

ENOTEMPTY

the given directory is not empty

EINVAL

the given directory is "." or ".."

EBUSY

the given directory is the root directory of the file system

OK

removed directory


REQ_RENAME

Rename a file or directory.

Request fields

REQ_REN_OLD_DIR

m9_l3

ino_t

inode number of containing directory for the old file

REQ_REN_NEW_DIR

m9_l4

ino_t

inode number of containing directory for the new file

REQ_REN_GRANT_OLD

m9_l2

cp_grant_id_t

memory grant (READ) for the old last path component

REQ_REN_LEN_OLD

m9_s1

unsigned short

length of the old last path component

REQ_REN_GRANT_NEW

m9_l1

cp_grant_id_t

memory grant (READ) for the new last path component

REQ_REN_LEN_NEW

m9_s2

unsigned short

length of the new last path component

Reply fields

Reply codes

ENAMETOOLONG

the last path component of the old or new file is longer than the file system supports

ENOENT

the old file does not exist

OK

the old and new last path component and containing directory are the same

EBUSY

the old file is a mountpoint directory

EINVAL

an attempt is made to move a directory to within its own subtree

EINVAL

the old or new last path component is "." or ".."

EMLINK

the old file is a directory and the new file doesn't exist but the new containing directory has the maximum number of links

ENOTDIR

the old file is a directory and the new file exists but is not a directory

EISDIR

the old file is not a directory and the new file exists but is a directory

ENOTEMPTY

the new file is a directory but is not empty

EBUSY

the new file is the root directory of the file system

ENOSPC

no space is left on the device

EFBIG

the new containing directory can not handle any more entries

OK

file renamed


Create a symbolic link.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number of the containing directory for the new file

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) for the link name's last path component

REQ_PATH_LEN

m9_s2

unsigned short

length of the link name's last path component

REQ_GRANT3

m9_l3

cp_grant_id_t

memory grant (READ) for the link target (not including a trailing '\0')

REQ_MEM_SIZE

m9_l5

size_t

length of the link target (not including a trailing '\0')

REQ_UID

m9_s4

uid_t

user ID for the new symlink

REQ_GID

m9_s1

gid_t

group ID for the new symlink

Reply fields

Reply codes

ENAMETOOLONG

the last path component is longer than the file system supports

EEXIST

a directory entry with that name already exists

ENFILE

no inodes are available

ENOSPC

no space is left on the device

EFBIG

the containing directory can not handle any more entries

ENAMETOOLONG

the link target contains '\0' bytes

OK

symbolic link created


Retrieve symbolic link target.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (WRITE) for buffer to write result to

REQ_MEM_SIZE

m9_l5

size_t

size of buffer to write to

Reply fields

RES_NBYTES

m9_l5

size_t

upon success: number of bytes written

Reply codes

OK

result stored in buffer


Miscellaneous file system operations

REQ_MOUNTPOINT

Mark an inode as mountpoint.

Request fields

REQ_INODE_NR

m9_l1

ino_t

inode number of file to use as mountpoint

Reply fields

Reply codes

EBUSY

inode already in use as mountpoint

ENOTDIR

given inode is not a directory

OK

inode marked as mountpoint


REQ_FSTATFS

Retrieve file system status.

Request fields

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (WRITE) to store resulting "struct statfs" in

Reply fields

Reply codes

OK

result stored in buffer


REQ_SYNC

Write any unwritten data to disk.

Request fields

Reply fields

Reply codes

OK

request processed successfully


Block I/O functions

REQ_FLUSH

Flush cached data for an unmounted device.

Request fields

REQ_DEV

m9_l5

dev_t

device number

Reply fields

Reply codes

EBUSY

the device is mounted

OK

cache flushed and invalidated for this device


REQ_NEW_DRIVER

Set a new driver endpoint for a major device.

Request fields

REQ_DEV

m9_l5

dev_t

device number

REQ_DRIVER_E

m9_l2

endpoint_t

driver endpoint

Reply fields

Reply codes

OK

request processed successfully


REQ_BREAD

Read from a block device directly.

Request fields

REQ_DEV2

m9_l1

dev_t

device number

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (WRITE) to store the resulting data in

REQ_SEEK_POS_LO

m9_l4

off_t

low 32 bits of position

REQ_SEEK_POS_HI

m9_l3

off_t

high 32 bits of position

REQ_NBYTES

m9_l5

size_t

number of bytes to read

Reply fields

RES_SEEK_POS_LO

m9_l4

off_t

upon success and failure: low 32 bits of resulting position

RES_SEEK_POS_HI

m9_l3

off_t

upon success and failure: high 32 bits of resulting position

RES_NBYTES

m9_l5

size_t

upon success and failure: total number of bytes read

Reply codes

EIO

I/O error reported by the device driver

OK

results successfully (partially) read, or EOF reached


REQ_BWRITE

Write to a block device directly.

Request fields

REQ_DEV2

m9_l1

dev_t

device number

REQ_GRANT

m9_l2

cp_grant_id_t

memory grant (READ) containing the data to write

REQ_SEEK_POS_LO

m9_l4

off_t

low 32 bits of position

REQ_SEEK_POS_HI

m9_l3

off_t

high 32 bits of position

REQ_NBYTES

m9_l5

size_t

number of bytes to write

Reply fields

RES_SEEK_POS_LO

m9_l4

off_t

upon success and failure: low 32 bits of resulting position

RES_SEEK_POS_HI

m9_l3

off_t

upon success and failure: high 32 bits of resulting position

RES_NBYTES

m9_l5

size_t

upon success and failure: total number of bytes written

Reply codes

EIO

I/O error reported by the device driver

OK

results successfully (partially) written, or EOF reached


Transaction functions

REQ_COMMIT

Commit a transaction (part of the transaction protocol).

Request fields

REQ_ID

m9_s1

unsigned short

Request ID

Reply fields

Reply codes

EINVAL

Request ID could not be committed (e.g., REQ_ID != (current id - 1))

COMMITTED

Request is committed

Description

This request tells the file server to commit a transaction. When VFS sends this request to an FS as a reply to a reply from an FS that is flagged 'auto-commit' or if it sends this reques more than once while a transaction is already committed, the FS replies COMMITTED.


REQ_RECOVER

Recover state after a crash.

Request fields

REQ_OLD_E

m9_l3

endpoint_t

Endpoint of crashed (initial) FS

Reply fields

Reply codes

EIO

Recovery process failed (e.g., due to data corruption)

OK

Recovery process completed successfully

Description

The FS allocates and registers shared memory regions for the inode cache and buffer cache based on the endpoint (e.g., using keys such as <endpoint>_i and <endpoint>_b) with DS, after receiving a mount request. Upon receiving a recovery request, it maps in the inode cache and buffer cache of the crashed FS and runs a recovery procedure. Note that the endpoint is the endpoint of the initial FS, because the key in DS will never change. There are no naming schemes defined; it is up to the FS to pick suitable names.


Dynamic Update functions

REQ_RESTART

Tell the FS it is about to perform a dynamic update, so it can flush dirty data to disk.

Request fields

Reply fields

Reply codes

OK

Dirty data is written to disk

Description

The FS does a sync to write the inode table to the block buffer and the block buffer to disk, followed by above reply and an 'exit.'


REQ_RELOAD

Restore buffers by reading a number of inodes from disk, such that the state of the FS is the same as before the update.

Request fields

REQ_INODE_NR

m9_l1

ino_t

Inode number of file to use as mountpoint

REQ_GRANT

m9_l2

cp_grant_id_t

Memory grant (READ) containing a list of inodes that the FS has to reopen

REQ_MEM_SIZE

m9_l5

size_t

Size of the inode list

REQ_OLD_E

m9_l3

endpoint_t

Endpoint of FS before dynamic update

Reply fields

Reply codes

OK

Reload completed successfully

Description

The VFS holds a list of inodes of which it thinks the FS has opened. For VFS, that is what the state of the FS looks like. By reading those inodes from disk, state is restored.
Because the block buffer is stored in a shared memory region, the reloading process is sped up by mapping in that shared memory (using the endpoint to retrieve the shared memory key in DS) and reading the blocks from cache instead of disk. The inode cache must be overwritten.


References

This document is not based on the original VFS-FS protocol documentation by Balazs Gerofi. However, that document may still provide additional insights.

Design and implementation of the MINIX Virtual File system by Balazs Gerofi, August, 2006

For more information on Dynamic Updates and Failure Resilience, see the Master's Thesis by Thomas Veerman.

Dynamic Updates and Failure Resilience for the Minix File Server by Thomas Veerman, May, 2009

MinixWiki: DevelopersGuide/VfsFsProtocolV2 (last edited 2010-01-24 15:43:45 by David van Moolenbroek)