**Work in progress**
This page is not complete or even accurate yet. In fact, it describes a version of libblockdriver that is not yet in minix-current.
====== The block driver library ======
The block driver library is a support library for block device drivers. Its primary function is to abstract away many details that are common across all block drivers, mainly with respect to processing incoming requests. On MINIX3, all block drivers are expected to make use of the block driver library.
This page primarily documents the interface of the block driver library. It provides some hints on how to write a block driver, but much more information on this subject can be found in [[.:blockprotocol|the block device protocol documentation]]. The reader is strongly advised to read that page first.
The libblockdriver library comes in two flavors: single-threaded and multi-threaded. The single-threaded version processes one command at a time, and does not allow for any form of parallelism. The multi-threaded version allows for parallelism in two dimensions: parallel processing of requests for different (physical) devices, and parallel (out-of-order) processing of requests to a single (physical) device. A driver that needs either or both of these forms of parallelism, must use the multi-threaded API; drivers that do not require any parallelism should make use of the single-threaded API.
===== The single-threaded interface =====
The following calls should be used if the driver wants to make use of the single-threaded version of the libblockdriver API.
void blockdriver_task(struct blockdriver *bdp);
void blockdriver_terminate(void);
int blockdriver_mq_queue(message *m_ptr, int ipc_status);
int blockdriver_receive_mq(message *m_ptr, int *status_ptr);
void blockdriver_process(struct blockdriver *bdp, message *m_ptr, int ipc_status);
In all typical cases, the driver will want to let libblockdriver perform the main message loop. Essentially, the main loop performs three iterative steps: 1) receiving a request or other message, 2) processing that message, and 3) sending a reply. Libblockdriver then makes callbacks into the driver to perform the specific processing tasks. The **blockdriver_task** call implements the message loop, and expects a callback table with functions that implement the actual driver code. This callback table is described in more detail further below.
The main loop continues until the driver decides to terminate. The driver can call **blockdriver_terminate** to tell libblockdriver to exit the main loop after the current request has been processed and the reply has been sent. After that, the call to ''blockdriver_task'' will return.
In the course of handling a request (for example, a transfer request), the driver may have to wait for an interrupt. In many cases, the driver will also want to put a time bound on the receipt of that interrupt, so as to detect timeout failures and take appropriate recovery actions in that case. Therefore, a typical action of a driver is to set an alarm, and then wait for either an interrupt or a timer notification message. However, since MINIX3 does not allow for receiving from multiple specific endpoints only, the driver will have to call ''driver_receive(ANY, ..)'' in order to be able to receive notifications from both HARDWARE (for interrupts) and CLOCK (for alarms). This means it will also end up receiving new request messages sent to the driver by for example file systems. Obviously, such new requests can not be processed for as long as the current request is still ongoing, so these requests have to be queued. The block driver library comes with its own queue for this purpose, and the driver can use the **blockdriver_mq_queue** call to add non-interrupt, non-timer messages to the queue for later processing. This call will return TRUE if it was able to enqueue the message, and FALSE if the queue is full.
A simple example of such a nested receive loop for receiving an interrupt message within a certain maximum amount of time, is shown in the code sample below. It makes use of the timer API present in libsys.
message m;
int r, ipc_status;
/* Set a timer in case we never actually get an interrupt. */
set_timer(...);
while (...request is ongoing...) {
/* Receive a message. */
if ((r = driver_receive(ANY, &m, &ipc_status)) != OK)
panic("driver_receive failed: %d", r);
switch (m.m_source) {
case HARDWARE:
/* Process the interrupt notification. */
*..;
break;
case CLOCK:
/* Have the timer callback function be called. */
expire_timers(m.NOTIFY_TIMESTAMP);
break;
default:
/* Another message; enqueue it for later. */
blockdriver_mq_queue(&m, ipc_status);
}
/* Never send a reply to anyone from here. */
}
/*
* Cancel the timer, regardless of whether it triggered. The driver must
* take into account that even if an interrupt triggered, an alarm
* notification may already be on its way at this point.
*/
cancel_timer(...);
In some (rare) cases, the driver may want more control over the message loop, for example if it wants to implement a threading model that is not compatible with the multi-threaded interface described below. For this purpose, libblockdriver exposes two functions that allow the reimplementation of the ''blockdriver_task'' function. **blockdriver_receive_mq** obtains a message for processing, from the message queue if there are any queued messages, and by calling ''driver_receive'' otherwise, returning either OK if a message is now available, or an error from ''driver_receive'' otherwise. **blockdriver_process** processes a message by calling the appropriate functions from the callback table. Note that if ''blockdriver_task'' is not used, the driver must implement its own termination facility, as a call to ''blockdriver_terminate'' will have no effect.
===== The multi-threaded interface =====
As stated before, the multi-threaded version of the library supports parallelism both across physical devices, and between requests to a single physical device. Either or both may be used by the driver.
In order to create parallelism across physical devices, the library creates separate threads for separate physical devices. Each physical device has a **device ID** of type ''device_id_t'', and the ''bdr_device'' callback function from the ''struct blockdriver'' structure (described below) is used to map minor device numbers to their physical devices (and thus device IDs). This allows libblockdriver to assign device-specific requests to the thread(s) of their respective physical devices. By default, each physical device has one //worker// thread associated with it. The library will automatically hand off requests to worker threads. Each physical device has a request queue associated with it, so that if a request comes in while no worker is available, the request will be queued in the request queue for the physical device.
In order to create parallelism between requests, each physical device may have more than one worker thread. Individual worker threads can be identified by their **thread ID** of type ''thread_id_t'', which is internally a combination of the device ID and the per-device worker ID. The number of workers can be set individually for each physical device, and defaults to 1, which means that there is no parallelism between requests for that physical device.
Worker threads can put themselves to sleep, and be woken up by other threads later. This is the mechanism that allows threads to wait for (for example) interrupts and alarms. The main loop will run whenever all worker threads are either idle or asleep, so unlike with the single-threaded library version, the driver must never call ''driver_receive'' itself and there is no direct access to the request queues.
The following calls should be used if the driver wants to make use of the multi-threaded version of the libblockdriver API.
void blockdriver_mt_task(struct blockdriver *bdp);
void blockdriver_mt_terminate(void);
void blockdriver_mt_set_workers(device_id_t did, int workers);
thread_id_t blockdriver_mt_get_tid(void);
void blockdriver_mt_sleep(void);
void blockdriver_mt_wakeup(thread_id_t tid);
The main loop of the multi-threaded version is implemented in **blockdriver_mt_task**. It takes the same ''struct blockdriver'' structure as the single-threaded version, but requires that the driver implement the ''bdr_device'' callback function. Some of the callback functions will be called from worker threads; some will be called from the main thread. In particular, those that can not (yet) be associated with a device are called from the main thread. In particular: ''bdr_intr'', ''bdr_alarm'', ''bdr_other'', and ''bdr_device''.
The **blockdriver_mt_terminate** function may be called to break out of the main loop. This call may invoked from the main thread or from a worker thread. The driver itself is responsible for making sure that there are no sleeping worker threads at the time of this call.
The **blockdriver_mt_set_workers** call is used to set the number of workers for each physical device. Since it is the driver that picks the device ID for each physical device (from a limited range), the driver can use the same set of IDs with calls to this function. The number of workers must be at least 1, and may be limited by the library to the supported maximum.
A worker thread can put itself to sleep by calling **blockdriver_mt_sleep**. It can then later be woken up from another thread with a call to **blockdriver_mt_wakeup**. This function takes a thread ID--the sleeping worker thread must obtain and (somehow) store its own thread ID using **blockdriver_mt_get_tid** before going to sleep, so that the thread waking it up knows which thread ID to use. The wakeup typically happens from the main thread, in response to an interrupt or an alarm going off. The main thread must never call ''blockdriver_mt_sleep'', as this would deadlock the driver.
===== The common interface =====
The following calls are for use from both single-threaded and multi-threaded drivers.
void blockdriver_announce(int type);
void partition(struct blockdriver *bdp, dev_t device);
The **blockdriver_announce** function must be called at driver startup time. In particular, it must be called from within the [[.:sef|SEF]] initialization callback routine, passing on the initialization type from that function as the //type// parameter. This function will not only announce the presence of the new driver to the rest of the system, but also initialize certain data structures.
The **partition** call allows drivers to let libblockdriver read and parse partition and subpartition tables on a device, and initialize the partition information in the driver accordingly. The exact functionality of this call depends on the value of ''bdr_type'' in the given block driver callback table //bdp//. The //device// parameter must be a minor device number for a full device (that is, never for a partition or a subpartition). The ''partition'' call expects that the size of the full device will be provided to it upon a call to ''bdr_part''. It will also use ''bdr_part'' to fill in any partitions and subpartition ''device'' structures for the given device. These partition and subpartition ''device'' structures need not be initialized before or during the call to ''partition''.
===== The callback table =====
The block driver library takes care of receiving request messages and passing them to appropriate callback functions that are implemented in the actual driver code. The driver code provides pointers to these callback functions, as well as other information, by handing a pointer to a ''struct blockdriver'' structure to the library. This section describes the fields of this structure, and the expectations of the library regarding their implementation.
int bdr_type;
Some block drivers implement devices that support partitions and possibly subpartitions. In that case, libblockdriver can handle certain partition operations on the driver's behalf. However, in order to be able to do this, the library must know exactly which partitioning scheme the driver is using. This includes the mapping between devices/partitions/subpartitions and minor device numbers. The ''bdr_type'' field determines the type of driver, and must be set to one of the following values:
* ''BLOCKDRIVER_TYPE_DISK'': the driver is for disk-like devices, and implements disk-style partitioning with the corresponding minor device numbering scheme. Libblockdriver supports partitions, subpartitions, and extended partitions, although the driver need not support either of those itself (see ''bdr_part'' below).
* ''BLOCKDRIVER_TYPE_FLOPPY'': the driver is for floppy-like devices, and implements floppy-style partitioning with the corresponding minor device numbering scheme. The floppy minor device numbering scheme, and thus also libblockdriver, supports partitions but not subpartitions.
* ''BLOCKDRIVER_TYPE_FLAT'': the driver is for devices that do not support any form of partitioning. Libblockdriver does not make any assumptions about the minor device numbering scheme.
* ''BLOCKDRIVER_TYPE_OTHER'': the driver does not want libblockdriver to handle anything related to partitions.
For the ''_DISK'', ''_FLOPPY'', and ''_FLAT'' types, libblockdriver will handle the ''DIOCGETP'' and ''DIOCSETP'' partition ioctl requests on the driver's behalf, calling the ''bdr_part'' and ''bdr_geometry'' callback routines as appropriate. For the ''_FLAT'' type, ''bdr_part'' will only be used to obtain the size of each entire device, and not for actual partitions. For the ''_OTHER'' type, the ''DIOCGETP'' and ''DIOCSETP'' ioctl requests will be passed to the ''bdr_ioctl'' callback function, and neither ''bdr_part'' nor ''bdr_geometry'' will ever be called (and thus those fields should be set to ''NULL''). An ''_OTHER'' type driver must not call ''partition''.
int (*bdr_open)(dev_t minor, int access);
This callback function is called when a client wants to open the minor device specified in //minor//. The //access// field may be a bitwise combination of the ''R_BIT'' and ''W_BIT'' flags.
The function must return either ''OK'' or a negative error code.
Implementation hints and notes:
* If the multi-threaded version of the library is used, the given minor device is guaranteed to exist, because it has already been validated by means of a call to the ''bdr_device'' callback function. If the single-threaded version of the library is used, this callback function must perform that check itself, and return ''ENXIO'' if the device cannot be opened.
* The device may already have been opened previously. The driver itself is responsible for keeping track of open counts. Disk drivers must do this, because they must implement the ''DIOCOPENCT'' ioctl.
* Drivers are not expected to enforce access restrictions, since this information is not retained on a per-session basis. Instead, a driver should refuse to open a read-only device for writing, by throwing the ''EACCES'' error code.
* Disk drivers are expected to perform partitioning when a hardware device is first opened.
int (*bdr_close)(dev_t minor);
This callback function is called when a client wants to close the minor device specified in //minor//.
The function must return either ''OK'' or a negative error code.
Implementation hints and notes:
* The device may not have been opened first. The function should return an ''EINVAL'' error code in that case.
ssize_t (*bdr_transfer)(dev_t minor, int do_write, u64_t pos, endpoint_t endpt, iovec_t *iov, unsigned count, int flags);
This callback function is called to perform a data transfer from or to the device. The transfer is requested for the given minor device //minor//. The //do_write// parameter is set if it is a write request, and cleared if it is a read request. The //pos// parameter contains the byte position of the start of the request, relative to the minor device's partition base.
The //endpt// parameter indicates the requesting endpoint. If //endpt// is set to SELF, the request comes from within the driver (typically as a result of a call to ''partition''), and //iov// contains a vector of type ''iovec_t'', i.e. with elements that each contain a local address and a size. If //endpt// is not set to SELF, an external process is making the transfer request, and //iov// must be interpreted as a vector of type ''iovec_s_t'', i.e. with elements that each contain a grant and a size. The //count// parameter indicates the number of elements in the vector, and is guaranteed not to exceed ''NR_IOREQS''.
The 'flags' parameter may contain any transfer flags supported by the block device protocol, please refer to [[.:blockprotocol|the block device protocol documentation]] for a list. As of writing, the only supported flag is ''BDEV_FORCEWRITE''.
The function must return either the number of bytes transferred, or a negative error code. The number of bytes may be less than the sum of the elements' sizes in the the I/O vector, and even zero, if device or partition end is reached before the full transfer has been completed.
int (*bdr_ioctl)(dev_t minor, unsigned int request, endpoint_t endpt, cp_grant_id_t grant)
This callback function is called for incoming ioctl requests, except for the ones that libblockdriver itself takes care of. The ioctl request code is given in //request//, and is called for the given minor device //minor//. The //endpt// parameter contains the endpoint of the requesting party, and the given //grant// parameter specifies the grant for the ioctl arguments. Not all ioctl requests involve argument data; for such requests, the last two fields must be ignored.
If the ''bdr_type'' field is set to ''BLOCKDRIVER_TYPE_OTHER'', then this function will also be called for ''DIOCGETP'' and ''DIOCSETP'' ioctl requests.
The function must return an appropriate response (typically ''OK'') or a negative error code.
Implementation hints and notes:
* Disk drivers are required to implement a number of ioctl requests. Please refer to [[.:blockprotocol|the block device protocol documentation]] for more information.
void (*bdr_cleanup)(void);
This callback function is called after an incoming request has been fully processed. This field may be set to ''NULL''.
struct device *(*bdr_part)(dev_t minor);
This callback function is expected to return a pointer to a ''device'' structure. The given //minor// device may be an entire device, a partition, or a subpartition. This function may be used for obtaining the entire device's size, and for both obtaining and assigning partition/subpartition base and size information. The function is only ever called if ''bdr_type'' is set to ''BLOCKDRIVER_TYPE_DISK'', ''BLOCKDRIVER_TYPE_FLOPPY'', and ''BLOCKDRIVER_TYPE_FLAT''. This field must not be set to ''NULL'' in that case. For flat devices, this function will only be called for the full device.
The function //must// return an appropriate device structure for all the minor device numbers that map to full devices, and in that case, the ''dv_base'' and ''dv_size'' fields must be initialized initialized with zero and the size of the device, respectively. For minor device numbers that map to partitions and subpartitions, the function //may// return ''NULL'' if the driver does not support partitions and/or subpartitions at all, but //must// return a device structure even for partitions and subpartitions that currently do not exist on the device. This allows the ''partition'' call to first obtain a pointer to, and then fill in, those device structures.
Implementation hints and notes:
* This function will only be called for devices that are actually open, so if the driver calls ''partition'' on the first open call of a device, and keeps around this information at least until the device is fully closed again, then this function can just return the information.
void (*bdr_geometry)(dev_t minor, struct partition *part);
This callback function is expected to fill in the geometry details for the given //minor// device in the ''cylinders'', ''heads'', and ''sectors'' fields of the given //part// structure. The callback function must not change other fields in the structure.
This function is only ever called if ''bdr_type'' is not set to ''BLOCKDRIVER_TYPE_OTHER''. Even then, this field may be set to ''NULL''--in that case libblockdriver will generate a fake geometry.
void (*bdr_intr)(unsigned int irqs);
This callback function is called when an interrupt notification arrives. This field may be set to ''NULL''.
The //irqs// parameter is a bitwise combination of asserted IRQ values.
Implementation hints and notes:
* Only drivers that register for multiple IRQs may want to make use of the //irqs// parameter.
void (*bdr_alarm)(clock_t stamp);
This callback function is called when a timer expires. This field may be set to ''NULL''.
The //stamp// parameter contains the current uptime timestamp.
int (*bdr_other)(message *m_ptr);
This callback function is called for any incoming messages that are not block device protocol requests. It is also called for incoming notifications other than interrupt and timer notifications. This field may be set to ''NULL''.
The function must return an appropriate response code, or ''EDONTREPLY'' in order not to send any response.
int (*bdr_device)(dev_t minor, device_id_t *did);
This callback function is used to map minor device numbers to devices. It is required when the multi-threaded version of the library is used, and should be set to ''NULL'' if the single-threaded version is used.
For every incoming request for a given //minor//, the library first calls this function to determine which physical device it belongs to, so that it can be handed off to (one of) the appropriate worker thread(s). Since the minor number is taken from the actual request, it may denote an entire device, a partition, or a subpartition; in all cases, the containing physical device must be returned.
The function must return ''OK'' if the given minor device is valid, in which case it must store an ID for the physical device in the variable pointed to by the //id// parameter. A maximum of ''BLOCKDRIVER_MAX_DEVICES'' physical devices is supported by the library, and the returned ID must be between 0 and ''BLOCKDRIVER_MAX_DEVICES-1'' inclusive.