The Data Link protocol

This page provides the official documentation of the Data Link protocol of MINIX 3. It describes the protocol used between the inet server, and ethernet drivers that control network interface card (NIC) hardware. The current version documents the protocol used in SVN revisions 9568 and later. If you update this document because of changes to MINIX 3, please mention the revision number of the change in the wiki comment.

General information

The following information is written mainly for people implementing an ethernet driver.

Instances

Each ethernet driver instance is responsible for exactly one ethernet card. A machine may have multiple ethernet cards of the same type; in this case, multiple copies of the same ethernet driver will be started. Each copy will be given a unique instance number (0 for the first instance of that particular driver, 1 for the second, and so on).

It is up to the driver to decide how to map instances to ethernet cards. This should be reasonably stable across machine boots, to make sure that the user's per-instance configuration will consistently apply to the same instance. The typical approach is to iterate linearly over PCI devices at startup time, skipping N matching PCI devices before reserving the next one.

Initialization

Upon initialization, the driver must retrieve the instance number. It is passed as one of the arguments to the driver, in the form instance=N where N is a decimal representation of the instance number. Obtaining the instance number is typically done by means of env_setargs() and env_parse(), as found in <minix/sysutil.h>.

The driver should try to initialize the device immediately when starting up, and fail initialization (for example, by means of panic()) if successful initialization of the device fails. Please note that for legacy reasons, this is not common practice in existing ethernet drivers.

Upon (successful) startup, the ethernet driver must announce its presence in DS. This should be done by calling the netdriver_announce() function found in libnetdriver.

Requests and replies

The basic protocol consists of inet sending requests to the ethernet driver, and the driver sending replies. The m_type message field contains the request or reply type. All message names and field aliases are defined in <minix/com.h>. The first request issued by inet will always be a DL_CONF request, although more DL_CONF requests may be issued later.

Inet will not send a new request until it has received a reply for the previous request. As such, the driver must always respond to each request with a reply immediately. In the case of the DL_READV_S and DL_WRITEV_S data transfer requests, this may lead to one immediate reply message to acknowledge a request, and one reply message later to complete the request. See the section on data transfer below. The two other requests, DL_CONF and DL_GETSTAT_S, follow a more strict request-reply form.

Ethernet drivers should use the netdriver_receive() function from libnetdriver to receive messages. The send() primitive should be used to send replies back to inet.

Data transfer

Like all other requests, data transfer is always initiated by inet. Inet will send a DL_READV_S request when it is ready to receive a packet. It will send a DL_WRITEV_S request to indicate that a packet should be sent.

The driver may not always be able to satisfy the send or receive request immediately, for example because there are no packets in the receive queue, or because the send queue is full. If the driver is able to satisfy the request immediately, it should send a DL_TASK_REPLY with the appropriate DL_PACK_ flag set in the DL_FLAGS field. (and, in the case of receives, with DL_COUNT set appropriately).

If the driver is not able to satisfy the request immediately, it should immediately send a DL_TASK_REPLY with a value of DL_NOFLAGS (0) in DL_FLAGS, to indicate that the request has been received but is still pending. Once the driver has been able to satisfy such a pending request (because a new packet arrived, or enough previous data were sent to be able to queue the new packet), it should send another DL_TASK_REPLY message, with the appropriate DL_PACK_ flag set in DL_FLAGS (and possibly with the DL_COUNT field set, see above). This usually happens in response to an interrupt; if the same interrupt happens to satisfy both a pending receive and a pending send request, one single DL_TASK_REPLY message may be used to acknowledge both; in that case, DL_STAT would contain the bitwise OR'ed combination of DL_PACK_SEND and DL_PACK_RECV, and DL_COUNT would contain the size of the received packet.

In other words, while a send or receive request must be acknowledged immediately with a reply, the request itself will stay pending until a matching DL_PACK_ flag is set in a reply. The driver will never receive another request of the same (DL_READV_S, DL_WRITEV_S) type while the previous request has not yet fully completed.

System interaction

The driver should use the System Event Framework (SEF). This framework automatically takes care of interaction with the Reincarnation Server (RS).

The driver should register a signal handler callback function through SEF, and immediately perform an exit from the callback function when it gets a SIGTERM signal. Before exiting, the driver should stop the device.

The driver will typically interact with the PCI server to find and reserve hardware devices. Description of this interaction is beyond the scope of this document.

Protocol messages

This section documents the messages used in the ethernet driver protocol.

Configuration

The DL_CONF message has two purposes. First, it specifies what promiscuity/multicast/broadcast mode the ethernet card should be changed to. Second, it requests the ethernet hardware address of the card. The driver may receive multiple DL_CONF messages over the course of its lifetime. The DL_CONF request looks like this:

Request

DL_CONF

request configuration and set mode

Fields

DL_MODE

m2_l1

unsigned int

flag field stating what mode(s) the NIC should have

The DL_MODE field is a bitwise combination of the following possible flags:

Alias

Value

Meaning

DL_PROMISC_REQ

0x1

promiscuous mode

DL_MULTI_REQ

0x2

multicast mode

DL_BROAD_REQ

0x4

broadcast mode

These flags indicate what types of packets are to be received, in addition to packets addressed specifically to the ethernet device's hardware address. The alias DL_NOMODE equals 0 and is used when none of the above flags are set.

The ethernet driver should change the mode of the hardware device to the mode indicated by these flags, and respond with the following reply:

Reply

DL_CONF_REPLY

provide ethernet configuration

Fields

DL_STAT

m3_i1

int

result code

DL_HWADDR

m3_ca1

ether_addr_t

upon success: ethernet hardware address

The result code must be either OK to indicate success, or a negative error code. If the driver cannot successfully reserve and interact with the device, this error code is expected to be ENXIO. Upon success, The ethernet hardware address (aka MAC address) is to be stored in the result message as well. The ether_addr_t structure is defined in <net/gen/ether.h>.

The driver may use the first DL_CONF message to initialize the hardware. However, it is recommended that the driver do this immediately at startup. The driver can then be hardcoded to return a OK response to DL_CONF requests. In the long term, the DL_STAT field may be removed.

Statistics

At any time after a device has been first configured with a DL_CONF message, inet may request packet transmission and error statistics from the driver:

Request

DL_GETSTAT_S

request ethernet statistics

Fields

DL_GRANT

m2_l2

cp_grant_id_t

grant (WRITE) for eth_stat_t structure

Upon receiving this request, the ethernet driver must fill the fields of an "eth_stat_t" structure as best it can, use sys_safecopyto to copy it out to the caller's provided grant, and send the following reply message.

Reply

DL_STAT_REPLY

provide ethernet statistics

Fields

none

The driver should not reset the statistics after processing this message. The eth_stat_t structure is defined in <net/gen/eth_io.h>, along with rough descriptions of what each field means.

There is currently no userland tool that prints these statistics.

Data transfer

The request from inet to receive a packet looks like this:

Request

DL_READV_S

receive ethernet packet

Fields

DL_GRANT

m2_l2

cp_grant_id_t

grant (READ) for iovec_s_t vector

DL_COUNT

m2_i3

int

number of vector elements

The request comes with a grant for a vector that specifies grants and sizes for the destination buffers. For the driver, the process of copying out a packet consists of copying in the vector (using sys_safecopyfrom), and repeatedly copying out (using sys_safecopyto) the next iov_size bytes of the received packet to the next iov_grant grant as specified by that element of the vector, until the entire packet is copied out (sys_vsafecopy may be used as well). The vector itself is DL_COUNT * sizeof(iovec_s_t) bytes in size, and DL_COUNT will not exceed NR_IOREQS. The total size of the buffers specified by the vector is guaranteed to be at least ETH_MAX_PACK_SIZE_TAGGED bytes, which should be large enough to receive the packet; if not, the driver may truncate the packet or simply panic. The driver itself should remove any trailing CRC bytes from the ethernet packet before copying it out. The driver must ensure that the resulting packet is at least ETH_MIN_PACK_SIZE bytes. Smaller packets (runt frames) must be thrown away.

The request from inet to send a packet looks like this:

Request

DL_WRITEV_S

send ethernet packet

Fields

DL_GRANT

m2_l2

cp_grant_id_t

grant (READ) for iovec_s_t vector

DL_COUNT

m2_i3

int

number of vector elements

Similar to the receive request, the send request includes a vector that specify grants and sizes for the buffers that contain the packet data. The process of copying in the packet consists of copying in the vector, and repeatedly copying in the next iov_size bytes of the packet to send to the next iov_grant grant as specified by that element of the vector, until the entire packet is copied in. The total size of the packet is guaranteed to be at least ETH_MIN_PACK_SIZE bytes and at most ETH_MAX_PACK_SIZE_TAGGED bytes. Depending on the underlying hardware, the driver may have to copy in the vector before being able to determine whether there is room in the send queue to send the entire packet.

The reply message for both DL_READV_S and DL_WRITEV_S requests is the same. It may acknowledge a request and/or signify completion of a pending request. When signifying completion a receive request, it must specify the size of the received packet. The message looks like this:

Reply

DL_TASK_REPLY

acknowledge pending or successful data transfer

Fields

DL_FLAGS

m2_l1

unsigned long

completion flags

DL_COUNT

m2_i3

int

if DL_PACK_RECV is set: received packet size, in bytes

The completion flags in DL_FLAGS may be a bitwise combination of the following flags:

Alias

Value

Meaning

DL_PACK_SEND

0x1

the send request has been completed

DL_PACK_RECV

0x2

the receive request has been completed

The alias DL_NOFLAGS equals 0 and is typically used to indicate that the just-received send or receive request could not be satisfied immediately, and is now pending.

MinixWiki: DevelopersGuide/DataLinkProtocol (last edited 2011-04-12 08:56:03 by David van Moolenbroek)