Table of Contents

VTreeFS

The VTreeFS library provides a common base that allows the rapid development of read-only, in-memory, hierarchical file systems. Originally designed to form the basis of both DevFS and ProcFS (still ongoing efforts), it is both flexible and easy to use. The library provides the following functionality:

  1. the main message loop of the file system;
  2. generic handlers for the requests from VFS;
  3. an interface to manipulate the virtual file system tree;
  4. callback hooks for refreshing of directories, and reading from regular files and symbolic links.

In essence, constructing a file system using VTreeFS consists of little more than implementing the appropriate callback hooks, and starting VTreeFS from the file system's main() function.

Applications wishing to use VTreeFS, must include the library's public header file, <minix/vtreefs.h>. They must link against the vtreefs library and the sys library, and in that order (“-lvtreefs -lsys”), because VTreeFS uses functionality from syslib.

Inodes

An application that uses the library, can add and remove directories and other files. All files (including directories) are represented using the primary object of the library: the inode. The library essentially manages a fully connected tree of inodes. API calls are provided to navigate the tree, and retrieve and manipulate inode properties. The inode object itself is opaque to the application.

Hard links are not supported, so every inode except the root inode is also an entry into its parent directory. The entry is identified in that directory by name. The names are bounded in length to save memory.

VTreeFS is set up to be fairly “memory conscious” in general: the number of inodes to allocate has to be specified at start-up time by the application, and this set of inodes will be preallocated. It is not possible to create more inodes than this number. As a result, no dynamic memory allocation takes place once VTreeFS has finished initializing.

Indexes

To satisfy the requirements of ProcFS, an inode may also have an index associated with the entry into the directory. This optional index determines the inode's position when getting returned by a getdents() call. This allows the application to guarantee that certain directory entries will show up in getdents() output exactly once, even if these entries are deleted and readded between getdents() calls.

Indexed inodes have another property: if VTreeFS runs out of inodes, it will first try to delete unused indexed entries. Applications that used indexed entries are expected to recreate any needed indexed entries from its callback functions. This allows ProcFS to expose a dynamically generated tree that when fully expanded would by far exceed the number of preallocated inodes, while still being able to provide accurate views of any parts of this tree to the callers. Since indexed entries are very specific to ProcFS, further explanation on this subject is left out of this document.

A typical file system will have no use for indexed entries, and simply specify NO_INDEX and zero indexed entries in inode creation calls.

The API

For reference, the VTreeFS header file is reproduced in its entirety here.

#!C
#ifndef _MINIX_VTREEFS_H
#define _MINIX_VTREEFS_H

struct inode;
typedef int index_t;
typedef void *cbdata_t;

#define NO_INDEX	((index_t) -1)

/* Maximum file name length, excluding terminating null character. It is set
  * to a low value to limit memory usage, but can be changed to any value.
  */
#define PNAME_MAX	24

struct inode_stat {
	mode_t mode;		/* file mode (type and permissions) */
	uid_t uid;		/* user ID */
	gid_t gid;		/* group ID */
	off_t size;		/* file size */
	dev_t dev;		/* device number (for char/block type files) */
};

struct fs_hooks {
	void (*init_hook)(void);
	void (*cleanup_hook)(void);
	int (*lookup_hook)(struct inode *inode, char *name, cbdata_t cbdata);
	int (*getdents_hook)(struct inode *inode, cbdata_t cbdata);
	int (*read_hook)(struct inode *inode, off_t offset, char **ptr,
		size_t *len, cbdata_t cbdata);
	int (*rdlink_hook)(struct inode *inode, char *ptr, size_t max,
		cbdata_t cbdata);
	int (*message_hook)(message *m);
};

extern struct inode *add_inode(struct inode *parent, char *name, index_t index,
	struct inode_stat *stat, index_t nr_indexed_entries, cbdata_t cbdata);
extern void delete_inode(struct inode *inode);

extern struct inode *get_inode_by_name(struct inode *parent, char *name);
extern struct inode *get_inode_by_index(struct inode *parent, index_t index);

extern char const *get_inode_name(struct inode *inode);
extern index_t get_inode_index(struct inode *inode);
extern cbdata_t get_inode_cbdata(struct inode *inode);

extern struct inode *get_root_inode(void);
extern struct inode *get_parent_inode(struct inode *inode);
extern struct inode *get_first_inode(struct inode *parent);
extern struct inode *get_next_inode(struct inode *previous);

extern void get_inode_stat(struct inode *inode, struct inode_stat *stat);
extern void set_inode_stat(struct inode *inode, struct inode_stat *stat);

extern void start_vtreefs(struct fs_hooks *hooks, unsigned int nr_inodes,
	struct inode_stat *stat, index_t nr_indexed_entries);

#endif /* _MINIX_VTREEFS_H */

API functions

add_inode adds an inode into a parent inode (which must be a directory), with the given name - a string that must consist of no more than PNAME_MAX characters. If the index parameter is not equal to NO_INDEX, it indicates the index position for the inode in the parent directory. The stat parameter points to a filled structure of inode metadata. This structure's mode field determines the file type and the access permissions (see /usr/include/sys/stat.h); directories (S_IFDIR), regular files (S_IFREG), character-special files (S_IFCHR), block-special files (S_IFBLK), and symbolic links (S_IFLNK) are supported. The uid, gid, size and dev fields specify the owning user and group ID, the size of the inode, and the device number (for block and character special files), respectively. The nr_indexed_entries parameter is only used for new directories (S_IFDIR), and indicates the range (0 to nr_indexed_entries-1) reserved for inodes with index numbers; this value may be 0 for directories that do not care about index numbers. The cbdata parameter specifies a caller-defined value passed to hook calls affecting this inode.

delete_inode removes the given inode. If the inode is a directory, all of its children will be removed recursively as well. The actual deletion may be deferred if the inode is still open. It will then automatically be removed once it is closed, and no callback functions will be called on it in the meantime.

get_inode_by_name and get_inode_by_index return an inode given a directory and either a name or an index number. They may fail, and in that case return NIL_INODE and NO_INDEX, respectively.

get_inode_name and get_inode_index return the name and index (or NO_INDEX) of a given inode, as assigned with the add_inode() call. The name pointer may simply point into the inode. get_inode_cbdata returns the cbdata value for an inode.

get_root_inode, get_parent_inode, get_first_inode and get_next_inode allow walking through the virtual tree of inodes, respectively retrieving the virtual tree's root inode, the parent inode of a given inode, the first child inode of a given parent, and the next inode in a series of children (given the previous result of get_first_inode() or get_next_inode()). The last three may return NIL_INODE if the directory does not have a parent (only in the case of the root directory), has no children, or has no more children, respectively.

get_inode_stat and set_inode_stat retrieve and manipulate inode metadata. Note that the file type of the inode must not be changed after creation.

start_vtreefs starts the main loop of the vtree file system library, accepting requests from VFS and possibly other sources (passing those on to the application), and making the appropriate callbacks to the application based on the hooks given by the application. Due to limitations of the SEF framework, this API call never returns; when VTreeFS is instructed to shut down, it will exit by itself. The hooks parameter specifies a structure of function pointers; see the next section for details. The nr_inodes parameter specifies the maximum number of inodes, which will also be preallocated at startup. Upon being started, the vtreefs library has to create a root inode; the stat and nr_indexed_entries parameters of start_vtreefs() determine the initial parameters of this root inode.

Callback hooks

The fs_hooks structure that must be provided to the start_vtreefs call, contains the following hook function pointers.

init_hook is called when the file system is mounted. At this point, VTreeFS has initialized itself, and it is possible to add inodes to the tree.

cleanup_hook is called when the file system is unmounted. The application should use this function to perform any cleanup it wants to do itself, because this is always the last hook call before the entire process exits.

lookup_hook is called every time a lookup for an entry other than “.” and “..” is made on an inode that is a directory and is search-accessible by the caller of the REQ_LOOKUP call. The hook call is made right before the library does the actual name lookup. The provided inode is the directory inode, and cbdata is the callback data of that inode. name is the path component being looked up. This hook allows the application to do for example the following things safely:

In the latter case, the hook implementation should return an error (typically ENOENT) to indicate that the lookup function should not continue; this is the error that will be returned to VFS. If OK is returned from the lookup function, the library continues the lookup.

getdents_hook is called every time a REQ_GETDENTS call is made on a directory inode. The hook call is made right before the library does the actual directory entry enumeration. The inode parameter is the inode of this directory, and cbdata is the callback data of this inode. The same semantics apply as for lookup_hook above.

read_hook is called when a user process reads from a regular (S_IFREG) file inode. inode and cbdata are the inode and callback data of this regular file, respectively. offset is the zero-based offset into the file from which reading should start, and len points to the requested read length. The hook implementation may return an error indicating why the file cannot be read. If the hook returns OK, then the library assumes that:

However, if EOF (is reached for the file, then the hook must return OK, and a length of 0 in len. The ptr value is then unused. As a sidenote, while returning a pointer to the data may seem strange, this construction avoids the extra overhead of copying the data from the application to the library on every read.

rdlink_hook is called when a user process reads from a symbolic link (S_IFLNK) inode. inode and cbdata are the inode and callback data of this symlink. ptr is a pointer to a memory area within the library, of size max. The hook implementation can write a path name string of up to max bytes into ptr, including the terminating '\0' character.

message_hook is called whenever the library's main loop receives a message that is an unsupported request from VFS, or a request not from VFS. The message parameter points to the message received. If the message was from VFS, the return value from the hook function will be used instead of ENOSYS when replying to VFS. If the message was not from VFS, it is fully up to the hook implementer to decide what to do with the message; the library will not send a reply by itself in this case.

All hook pointers given in the fs_hooks structure may be NULL, in which case sensible defaults will be used.

If a file system is mounted and unmounted more than once during its process lifetime (as is the case for ProcFS, for example), the init_hook and cleanup_hook hooks may be called more than once as well. The constructed tree is not destroyed at unmount time, so the init hook should be careful not to recreate nodes that already exist. This behavior is not ideal and may be changed later.

An example

Below is a very simple TestFS file system that makes use of VTreeFS to expose a single file called “test” which contains a textual representation of the current time. TestFS consists of two files, Makefile and testfs.c, which must both be placed in the same directory.

We start with the Makefile:

# Makefile for TestFS server
PROG=	testfs
SRCS=	testfs.c

DPADD+=	${LIBVTREEFS} ${LIBSYS} ${LIBMINLIB}
LDADD+=	-lvtreefs -lsys -lminlib

CFLAGS+= -D_NETBSD_SOURCE=1

MAN=

BINDIR?= /sbin

.include <bsd.prog.mk>

Then testfs.c:

#!C
#include <minix/drivers.h>
#include <minix/vtreefs.h>
#include <sys/stat.h>
#include <time.h>
#include <assert.h>

static void my_init_hook(void)
{	
	/* This hook will be called once, after VTreeFS has initialized.
	 */
	struct inode_stat file_stat;
	struct inode *inode;

	/* We create one regular file in the root directory. The file is
	 * readable by everyone, and owned by root. Its size as returned by for
	 * example stat() will be zero, but that does not mean it is empty.
	 * For files with dynamically generated content, the file size is
	 * typically set to zero.
	 */
	file_stat.mode = S_IFREG | 0444;
	file_stat.uid = 0;
	file_stat.gid = 0;
	file_stat.size = 0;
	file_stat.dev = NO_DEV;

	/* Now create the actual file. It is called "test" and does not have an
	 * index number. Its callback data value is set to 1, allowing it to be
	 * identified with this number later.
	 */
	inode = add_inode(get_root_inode(), "test", NO_INDEX, &file_stat, 0,
		(cbdata_t) 1);

	assert(inode != NULL);
}

static int my_read_hook(struct inode *inode, off_t offset, char **ptr,
	size_t *len, cbdata_t cbdata)
{
	/* This hook will be called every time a regular file is read. We use
	 * it to dyanmically generate the contents of our file.
	 */
	static char data[26];
	const char *str;
	time_t now;

	/* We have only a single file. With more files, cbdata may help
	 * distinguishing between them.
	 */
	assert((int) cbdata == 1);

	/* Generate the contents of the file into the 'data' buffer. We could
	 * use the return value of ctime() directly, but that would make for a
	 * lousy example.
	 */
	time(&now);

	str = ctime(&now);

	strcpy(data, str);

	/* If the offset is beyond the end of the string, return EOF. */
	if (offset > strlen(data)) {
		*len = 0;

		return OK;
	}

	/* Otherwise, return a pointer into 'data'. If necessary, bound the
	 * returned length to the length of the rest of the string. Note that
	 * 'data' has to be static, because it will be used after this function
	 * returns.
	 */
	*ptr = data + offset;

	if (*len > strlen(data) - offset)
		*len = strlen(data) - offset;

	return OK;
}

/* The table with callback hooks. */
struct fs_hooks my_hooks = {
	my_init_hook,
	NULL, /* cleanup_hook */
	NULL, /* lookup_hook */
	NULL, /* getdents_hook */
	my_read_hook,
	NULL, /* rdlink_hook */
	NULL  /* message_hook */
};

int main(void)
{
	struct inode_stat root_stat;

	/* Fill in the details to be used for the root inode. It will be a
	 * directory, readable and searchable by anyone, and owned by root.
	 */
	root_stat.mode = S_IFDIR | 0555;
	root_stat.uid = 0;
	root_stat.gid = 0;
	root_stat.size = 0;
	root_stat.dev = NO_DEV;

	/* Now start VTreeFS. Preallocate 10 inodes, which is more than we'll
	 * need for this example. No indexed entries are used.
	 */
	start_vtreefs(&my_hooks, 10, &root_stat, 0);

	/* The call above never returns. This just keeps the compiler happy. */
	return 0;
}

From the directory that contains both these files, TestFS can be built with make and installed with make install.

After installation, one more step is needed before TestFS can be mounted. TestFS is a system server, so it needs its own entry in /etc/system.conf. This entry can be very simple, because TestFS needs no privileges beyond those given to it by default. The entry should therefore look like this:

service testfs {
};

Now you should be able to mount TestFS:

mount -t testfs none /mnt

If everything worked as expected, a file called “test” should show up in /mnt now. If you issue cat /mnt/test, you will be presented with the current time. The time is renewed on every read.

Finally, TestFS can be unmounted again with the following command:

umount /mnt

References

This document is based heavily on the original VTreeFS design document, although that document is no longer updated with changes to VTreeFS.