Network driver development is hard

Writing network drivers is a tricky business as anyone who has ever tried it will tell you. Or you are talking to Bill Joy. 

My experience has been on a various RTOSes but mostly on VxWorks. Recently, I wrote a network driver for a new gigabit Ethernet controller for QNX 7. The experience provided some interesting comparisons and contrasts.

I’ve been writing BSPs and drivers for VxWorks for nearly 25 years. In that time the operating system has changed considerably but the fundamental architecture is the same. The kernel and its device drivers operate in the same address space and by definition have access to each other’s code and data.

This makes inspecting memory and data fairly straight-forward. Moreover, one can invoke driver methods directly and alter variables using a VxWorks kernel shell. The flip side is that attaching a new, unstable Ethernet driver to the network stack can, and often does, result in unexpected crashes, freezes and general bad behaviour.  This can be hard to pin down because the OS debugging and instrumentation tools are themselves often connected via a network interface. A network connected debugger is not going to help if the driver under development has stalled the stack because it crashed while holding a mutex semaphore.

QNX process model

The QNX architecture is quite different. It has a microkernel architecture, where the kernel and its drivers all exist as independent processes that communicate using message passing. This promotes stability and reliability because errors are isolated in the address space where they occur and allows OS to detect and restart processes that have crashed. Not everyone agrees.

What does this mean in practice? The business of QNX network driver development is much the same as it is on VxWorks: reading and understanding the datasheet, buffer management, handling DMA descriptor rings, virtual-to-physical address translation, interrupt handling, and so on.

Create another stack

The QNX network stack is just another process. A network driver is built as a shared library and loaded into the network stack process address space when required. This meant I could start a new stack instance to test the driver I was developing. The primary network stack used by the development tools to connect to the target system was unaffected.

For example, the following starts a second stack instance and makes it available at /sock2 in the kernel name space.

# io-pkt-v6-hc -i2 -ptcpip prefix=/sock2

The new driver is then attached to the new network stack:

# mount -Tio-pkt2 /path/to/driver.so

The environment variable SOCK specifies the network stack instance to use. For example, the following assigns an IP address to the new network interface and pings a remote host.

# SOCK=/sock2 ifconfig abc0 192.168.1.5 
# SOCK=/sock2 ping 192.168.1.1

If things go wrong the io-pkt process crashes and creates a core dump file which can be uploaded to the development host. The debugger attaches to the core dump file allowing inspection of memory and stack frames and with a bit of luck telling you why the crash occurred. 

I found writing a network driver for QNX 7 an interesting and rewarding experience. I missed the familiarity and convenience of the VxWorks kernel shell but testing a new driver in an isolated process that did not crash the kernel made up for it.