User Tools

Site Tools


developersguide:libblockdriver

Work in progress

This page is not complete or even accurate yet. In fact, it describes a version of libblockdriver that is not yet in minix-current.

The block driver library

The block driver library is a support library for block device drivers. Its primary function is to abstract away many details that are common across all block drivers, mainly with respect to processing incoming requests. On MINIX3, all block drivers are expected to make use of the block driver library.

This page primarily documents the interface of the block driver library. It provides some hints on how to write a block driver, but much more information on this subject can be found in the block device protocol documentation. The reader is strongly advised to read that page first.

The libblockdriver library comes in two flavors: single-threaded and multi-threaded. The single-threaded version processes one command at a time, and does not allow for any form of parallelism. The multi-threaded version allows for parallelism in two dimensions: parallel processing of requests for different (physical) devices, and parallel (out-of-order) processing of requests to a single (physical) device. A driver that needs either or both of these forms of parallelism, must use the multi-threaded API; drivers that do not require any parallelism should make use of the single-threaded API.

The single-threaded interface

The following calls should be used if the driver wants to make use of the single-threaded version of the libblockdriver API.

void blockdriver_task(struct blockdriver *bdp);
void blockdriver_terminate(void);
int blockdriver_mq_queue(message *m_ptr, int ipc_status);
int blockdriver_receive_mq(message *m_ptr, int *status_ptr);
void blockdriver_process(struct blockdriver *bdp, message *m_ptr, int ipc_status);

In all typical cases, the driver will want to let libblockdriver perform the main message loop. Essentially, the main loop performs three iterative steps: 1) receiving a request or other message, 2) processing that message, and 3) sending a reply. Libblockdriver then makes callbacks into the driver to perform the specific processing tasks. The blockdriver_task call implements the message loop, and expects a callback table with functions that implement the actual driver code. This callback table is described in more detail further below.

The main loop continues until the driver decides to terminate. The driver can call blockdriver_terminate to tell libblockdriver to exit the main loop after the current request has been processed and the reply has been sent. After that, the call to blockdriver_task will return.

In the course of handling a request (for example, a transfer request), the driver may have to wait for an interrupt. In many cases, the driver will also want to put a time bound on the receipt of that interrupt, so as to detect timeout failures and take appropriate recovery actions in that case. Therefore, a typical action of a driver is to set an alarm, and then wait for either an interrupt or a timer notification message. However, since MINIX3 does not allow for receiving from multiple specific endpoints only, the driver will have to call driver_receive(ANY, ..) in order to be able to receive notifications from both HARDWARE (for interrupts) and CLOCK (for alarms). This means it will also end up receiving new request messages sent to the driver by for example file systems. Obviously, such new requests can not be processed for as long as the current request is still ongoing, so these requests have to be queued. The block driver library comes with its own queue for this purpose, and the driver can use the blockdriver_mq_queue call to add non-interrupt, non-timer messages to the queue for later processing. This call will return TRUE if it was able to enqueue the message, and FALSE if the queue is full.

A simple example of such a nested receive loop for receiving an interrupt message within a certain maximum amount of time, is shown in the code sample below. It makes use of the timer API present in libsys.

  message m;
  int r, ipc_status;

  /* Set a timer in case we never actually get an interrupt. */
  set_timer(...);

  while (...request is ongoing...) {
          /* Receive a message. */
          if ((r = driver_receive(ANY, &m, &ipc_status)) != OK)
                  panic("driver_receive failed: %d", r);

          switch (m.m_source) {
          case HARDWARE:
                  /* Process the interrupt notification. */
                                    *..;
                  break;
          case CLOCK:
                  /* Have the timer callback function be called. */
                  expire_timers(m.NOTIFY_TIMESTAMP);
                  break;
          default:
                  /* Another message; enqueue it for later. */
                  blockdriver_mq_queue(&m, ipc_status);
          }

          /* Never send a reply to anyone from here. */
  }

  /*
      * Cancel the timer, regardless of whether it triggered. The driver must
      * take into account that even if an interrupt triggered, an alarm
      * notification may already be on its way at this point.
      */
  cancel_timer(...);

In some (rare) cases, the driver may want more control over the message loop, for example if it wants to implement a threading model that is not compatible with the multi-threaded interface described below. For this purpose, libblockdriver exposes two functions that allow the reimplementation of the blockdriver_task function. blockdriver_receive_mq obtains a message for processing, from the message queue if there are any queued messages, and by calling driver_receive otherwise, returning either OK if a message is now available, or an error from driver_receive otherwise. blockdriver_process processes a message by calling the appropriate functions from the callback table. Note that if blockdriver_task is not used, the driver must implement its own termination facility, as a call to blockdriver_terminate will have no effect.

The multi-threaded interface

As stated before, the multi-threaded version of the library supports parallelism both across physical devices, and between requests to a single physical device. Either or both may be used by the driver.

In order to create parallelism across physical devices, the library creates separate threads for separate physical devices. Each physical device has a device ID of type device_id_t, and the bdr_device callback function from the struct blockdriver structure (described below) is used to map minor device numbers to their physical devices (and thus device IDs). This allows libblockdriver to assign device-specific requests to the thread(s) of their respective physical devices. By default, each physical device has one worker thread associated with it. The library will automatically hand off requests to worker threads. Each physical device has a request queue associated with it, so that if a request comes in while no worker is available, the request will be queued in the request queue for the physical device.

In order to create parallelism between requests, each physical device may have more than one worker thread. Individual worker threads can be identified by their thread ID of type thread_id_t, which is internally a combination of the device ID and the per-device worker ID. The number of workers can be set individually for each physical device, and defaults to 1, which means that there is no parallelism between requests for that physical device.

Worker threads can put themselves to sleep, and be woken up by other threads later. This is the mechanism that allows threads to wait for (for example) interrupts and alarms. The main loop will run whenever all worker threads are either idle or asleep, so unlike with the single-threaded library version, the driver must never call driver_receive itself and there is no direct access to the request queues.

The following calls should be used if the driver wants to make use of the multi-threaded version of the libblockdriver API.

void blockdriver_mt_task(struct blockdriver *bdp);
void blockdriver_mt_terminate(void);
void blockdriver_mt_set_workers(device_id_t did, int workers);
thread_id_t blockdriver_mt_get_tid(void);
void blockdriver_mt_sleep(void);
void blockdriver_mt_wakeup(thread_id_t tid);

The main loop of the multi-threaded version is implemented in blockdriver_mt_task. It takes the same struct blockdriver structure as the single-threaded version, but requires that the driver implement the bdr_device callback function. Some of the callback functions will be called from worker threads; some will be called from the main thread. In particular, those that can not (yet) be associated with a device are called from the main thread. In particular: bdr_intr, bdr_alarm, bdr_other, and bdr_device.

The blockdriver_mt_terminate function may be called to break out of the main loop. This call may invoked from the main thread or from a worker thread. The driver itself is responsible for making sure that there are no sleeping worker threads at the time of this call.

The blockdriver_mt_set_workers call is used to set the number of workers for each physical device. Since it is the driver that picks the device ID for each physical device (from a limited range), the driver can use the same set of IDs with calls to this function. The number of workers must be at least 1, and may be limited by the library to the supported maximum.

A worker thread can put itself to sleep by calling blockdriver_mt_sleep. It can then later be woken up from another thread with a call to blockdriver_mt_wakeup. This function takes a thread ID–the sleeping worker thread must obtain and (somehow) store its own thread ID using blockdriver_mt_get_tid before going to sleep, so that the thread waking it up knows which thread ID to use. The wakeup typically happens from the main thread, in response to an interrupt or an alarm going off. The main thread must never call blockdriver_mt_sleep, as this would deadlock the driver.

The common interface

The following calls are for use from both single-threaded and multi-threaded drivers.

void blockdriver_announce(int type);
void partition(struct blockdriver *bdp, dev_t device);

The blockdriver_announce function must be called at driver startup time. In particular, it must be called from within the SEF initialization callback routine, passing on the initialization type from that function as the type parameter. This function will not only announce the presence of the new driver to the rest of the system, but also initialize certain data structures.

The partition call allows drivers to let libblockdriver read and parse partition and subpartition tables on a device, and initialize the partition information in the driver accordingly. The exact functionality of this call depends on the value of bdr_type in the given block driver callback table bdp. The device parameter must be a minor device number for a full device (that is, never for a partition or a subpartition). The partition call expects that the size of the full device will be provided to it upon a call to bdr_part. It will also use bdr_part to fill in any partitions and subpartition device structures for the given device. These partition and subpartition device structures need not be initialized before or during the call to partition.

The callback table

The block driver library takes care of receiving request messages and passing them to appropriate callback functions that are implemented in the actual driver code. The driver code provides pointers to these callback functions, as well as other information, by handing a pointer to a struct blockdriver structure to the library. This section describes the fields of this structure, and the expectations of the library regarding their implementation.

int bdr_type;

Some block drivers implement devices that support partitions and possibly subpartitions. In that case, libblockdriver can handle certain partition operations on the driver's behalf. However, in order to be able to do this, the library must know exactly which partitioning scheme the driver is using. This includes the mapping between devices/partitions/subpartitions and minor device numbers. The bdr_type field determines the type of driver, and must be set to one of the following values:

  • BLOCKDRIVER_TYPE_DISK: the driver is for disk-like devices, and implements disk-style partitioning with the corresponding minor device numbering scheme. Libblockdriver supports partitions, subpartitions, and extended partitions, although the driver need not support either of those itself (see bdr_part below).
  • BLOCKDRIVER_TYPE_FLOPPY: the driver is for floppy-like devices, and implements floppy-style partitioning with the corresponding minor device numbering scheme. The floppy minor device numbering scheme, and thus also libblockdriver, supports partitions but not subpartitions.
  • BLOCKDRIVER_TYPE_FLAT: the driver is for devices that do not support any form of partitioning. Libblockdriver does not make any assumptions about the minor device numbering scheme.
  • BLOCKDRIVER_TYPE_OTHER: the driver does not want libblockdriver to handle anything related to partitions.

For the _DISK, _FLOPPY, and _FLAT types, libblockdriver will handle the DIOCGETP and DIOCSETP partition ioctl requests on the driver's behalf, calling the bdr_part and bdr_geometry callback routines as appropriate. For the _FLAT type, bdr_part will only be used to obtain the size of each entire device, and not for actual partitions. For the _OTHER type, the DIOCGETP and DIOCSETP ioctl requests will be passed to the bdr_ioctl callback function, and neither bdr_part nor bdr_geometry will ever be called (and thus those fields should be set to NULL). An _OTHER type driver must not call partition.

int (*bdr_open)(dev_t minor, int access);

This callback function is called when a client wants to open the minor device specified in minor. The access field may be a bitwise combination of the R_BIT and W_BIT flags.

The function must return either OK or a negative error code.

Implementation hints and notes:

  • If the multi-threaded version of the library is used, the given minor device is guaranteed to exist, because it has already been validated by means of a call to the bdr_device callback function. If the single-threaded version of the library is used, this callback function must perform that check itself, and return ENXIO if the device cannot be opened.
  • The device may already have been opened previously. The driver itself is responsible for keeping track of open counts. Disk drivers must do this, because they must implement the DIOCOPENCT ioctl.
  • Drivers are not expected to enforce access restrictions, since this information is not retained on a per-session basis. Instead, a driver should refuse to open a read-only device for writing, by throwing the EACCES error code.
  • Disk drivers are expected to perform partitioning when a hardware device is first opened.
int (*bdr_close)(dev_t minor);

This callback function is called when a client wants to close the minor device specified in minor.

The function must return either OK or a negative error code.

Implementation hints and notes:

  • The device may not have been opened first. The function should return an EINVAL error code in that case.
ssize_t (*bdr_transfer)(dev_t minor, int do_write, u64_t pos, endpoint_t endpt, iovec_t *iov, unsigned count, int flags);

This callback function is called to perform a data transfer from or to the device. The transfer is requested for the given minor device minor. The do_write parameter is set if it is a write request, and cleared if it is a read request. The pos parameter contains the byte position of the start of the request, relative to the minor device's partition base.

The endpt parameter indicates the requesting endpoint. If endpt is set to SELF, the request comes from within the driver (typically as a result of a call to partition), and iov contains a vector of type iovec_t, i.e. with elements that each contain a local address and a size. If endpt is not set to SELF, an external process is making the transfer request, and iov must be interpreted as a vector of type iovec_s_t, i.e. with elements that each contain a grant and a size. The count parameter indicates the number of elements in the vector, and is guaranteed not to exceed NR_IOREQS.

The 'flags' parameter may contain any transfer flags supported by the block device protocol, please refer to the block device protocol documentation for a list. As of writing, the only supported flag is BDEV_FORCEWRITE.

The function must return either the number of bytes transferred, or a negative error code. The number of bytes may be less than the sum of the elements' sizes in the the I/O vector, and even zero, if device or partition end is reached before the full transfer has been completed.

int (*bdr_ioctl)(dev_t minor, unsigned int request, endpoint_t endpt, cp_grant_id_t grant)

This callback function is called for incoming ioctl requests, except for the ones that libblockdriver itself takes care of. The ioctl request code is given in request, and is called for the given minor device minor. The endpt parameter contains the endpoint of the requesting party, and the given grant parameter specifies the grant for the ioctl arguments. Not all ioctl requests involve argument data; for such requests, the last two fields must be ignored.

If the bdr_type field is set to BLOCKDRIVER_TYPE_OTHER, then this function will also be called for DIOCGETP and DIOCSETP ioctl requests.

The function must return an appropriate response (typically OK) or a negative error code.

Implementation hints and notes:

void (*bdr_cleanup)(void);

This callback function is called after an incoming request has been fully processed. This field may be set to NULL.

struct device *(*bdr_part)(dev_t minor);

This callback function is expected to return a pointer to a device structure. The given minor device may be an entire device, a partition, or a subpartition. This function may be used for obtaining the entire device's size, and for both obtaining and assigning partition/subpartition base and size information. The function is only ever called if bdr_type is set to BLOCKDRIVER_TYPE_DISK, BLOCKDRIVER_TYPE_FLOPPY, and BLOCKDRIVER_TYPE_FLAT. This field must not be set to NULL in that case. For flat devices, this function will only be called for the full device.

The function must return an appropriate device structure for all the minor device numbers that map to full devices, and in that case, the dv_base and dv_size fields must be initialized initialized with zero and the size of the device, respectively. For minor device numbers that map to partitions and subpartitions, the function may return NULL if the driver does not support partitions and/or subpartitions at all, but must return a device structure even for partitions and subpartitions that currently do not exist on the device. This allows the partition call to first obtain a pointer to, and then fill in, those device structures.

Implementation hints and notes:

  • This function will only be called for devices that are actually open, so if the driver calls partition on the first open call of a device, and keeps around this information at least until the device is fully closed again, then this function can just return the information.
void (*bdr_geometry)(dev_t minor, struct partition *part);

This callback function is expected to fill in the geometry details for the given minor device in the cylinders, heads, and sectors fields of the given part structure. The callback function must not change other fields in the structure.

This function is only ever called if bdr_type is not set to BLOCKDRIVER_TYPE_OTHER. Even then, this field may be set to NULL–in that case libblockdriver will generate a fake geometry.

void (*bdr_intr)(unsigned int irqs);

This callback function is called when an interrupt notification arrives. This field may be set to NULL.

The irqs parameter is a bitwise combination of asserted IRQ values.

Implementation hints and notes:

  • Only drivers that register for multiple IRQs may want to make use of the irqs parameter.
void (*bdr_alarm)(clock_t stamp);

This callback function is called when a timer expires. This field may be set to NULL.

The stamp parameter contains the current uptime timestamp.

int (*bdr_other)(message *m_ptr);

This callback function is called for any incoming messages that are not block device protocol requests. It is also called for incoming notifications other than interrupt and timer notifications. This field may be set to NULL.

The function must return an appropriate response code, or EDONTREPLY in order not to send any response.

int (*bdr_device)(dev_t minor, device_id_t *did);

This callback function is used to map minor device numbers to devices. It is required when the multi-threaded version of the library is used, and should be set to NULL if the single-threaded version is used.

For every incoming request for a given minor, the library first calls this function to determine which physical device it belongs to, so that it can be handed off to (one of) the appropriate worker thread(s). Since the minor number is taken from the actual request, it may denote an entire device, a partition, or a subpartition; in all cases, the containing physical device must be returned.

The function must return OK if the given minor device is valid, in which case it must store an ID for the physical device in the variable pointed to by the id parameter. A maximum of BLOCKDRIVER_MAX_DEVICES physical devices is supported by the library, and the returned ID must be between 0 and BLOCKDRIVER_MAX_DEVICES-1 inclusive.

developersguide/libblockdriver.txt · Last modified: 2014/11/12 15:50 by lionelsambuc