Top 10 Ways to Make Device Drivers Unreliable
Device drivers are the foundation of many embedded systems, and they are expected to be robust and stable while also delivering high performance. However, there are many ways to ensure that your drivers are unreliable, some of which are more subtle and easier to fall foul of than you might expect. Here are some favourites based on the extensive experience of the Pebble Bay team.
1. Do not initialise the device correctly
Today’s devices are often very complex and flexible, inevitably meaning that the device driver has to configure them to operate in the right mode and initialise them to make sure they’re in the proper state. This can involve a lot of code that is usually based on reading the device data sheet, and interpreting it correctly. Failure to do this can mean that the device is not properly initialised and is likely to behave strangely if it works at all. Often the problem is in the code that you did not write: it’s surprisingly easy to forget to initialise a critical part of the device, particularly if the data sheet does not spell out all the steps in a logical order or if you are “just building a prototype” that evolves into production code.
The classic signs of this problem are drivers that work correctly after a power cycle, but not after a processor reset.
2. Mis-configure the interrupt controller
Your device driver may need to configure an interrupt controller to respond to an interrupt from the device it is controlling. Getting this wrong can be the source of both obvious and not-so-obvious problems.
Usually there are three things to get right when configuring a specific interrupt on an interrupt controller:
- priority (if the controller supports nested interrupts);
- type: whether the interrupt is edge- or level-triggered;
- polarity: whether the interrupt is asserted on a low level (or low-going edge) or a high level( or high-going edge).
In most cases, getting these details wrong means an obvious problem such as receiving no interrupts, or continuous interrupts, both of which are not too hard to diagnose and fix. But surprisingly, it doesn’t always result in catastrophic failure – your driver might work, just not very well.
One project I worked on involved diagnosing and fixing an Ethernet interface that was very sluggish and had very low throughput. Among other things, it turned out that the interrupt from the Ethernet controller was configured to be edge-triggered but the controller actually generated a level-triggered interrupt: most of the time, when the device was interrupting to indicate that it had received packets, the interrupt controller was not passing the interrupt to the CPU!
3. Acknowledge and clear interrupts wrongly
Another potential problem with interrupt handling is exactly when and how the interrupt from a device should be acknowledged and cleared, at least so far as it is not done automatically by the hardware. Often this is done by writing to an interrupt status register, but the details will vary from device to device. In some cases, writing to this register will acknowledge and clear all the device’s pending interrupts, which may dictate some careful software design to make sure your driver has handled them all first – otherwise you might end up “missing” an interrupt. Whatever the mechanics of clearing the interrupt, you will need to think carefully about corner cases: what happens if the device generates an interrupt at about the same time as your driver clears the interrupt(s) it has just handled? Device data sheets can vary enormously in the amount of advice they give in this area: some may give example code which can be very helpful.
4. Do not bother using “volatile” for memory-mapped i/o
It’s really not optional: if you don’t use volatile to tell the compiler that (for example) memory-mapped registers can change independently of the code, your driver will not work reliably. Typically it might be fine while you are debugging it, but once you build with optimisation enabled (or increased) it may stop working altogether or – less conveniently – fail occasionally. Looking at the assembly code you will see that the compiler may have converted a polling loop into a simple “if” statement, or loaded the value it is testing at each iteration into a register, where it will not change, of course. It may have re-ordered a sequence of accesses to the device registers, with possibly unpredictable results.
Using volatile whenever your driver accesses memory-mapped devices should be a matter of course – far better to put it in by design rather than retrofit it after tracking down a nasty bug caused by having left it out.
Of course, this is not the end of the story. Even if the compiler does not re-order access to memory-mapped devices you can still get into trouble if you …
5. Ignore details of the processor’s bus interface
Most modern high-performance processor architectures go to great lengths to de-couple their external bus interface from execution of the instruction stream – this is because external bus cycles are normally very slow compared with the time to execute each instruction, so the processor would stall if it had to wait for each bus cycle to complete before executing the next instruction.
We normally take it as a given that drivers can access memory-mapped registers without worrying about the effects of a data cache (i.e. the device is mapped into non-cached region of address space) but that’s not the only potential problem to contend with. Some processors allow accesses to the external bus to be queued, possibly coalesced, and maybe completed in a different order to what’s intended by the program. All of that may be fine for “normal” data accesses, but it will play havoc with memory-mapped i/o! The key to building a reliable driver in this case is to understand the details of the bus interface and how the instruction stream can be synchronised with external bus cycles. This usually involves use of very processor-specific assembly instructions, which – if you are lucky – may be wrapped in C-callable form by your development tools. Examples include the “eieio” and “sync” instructions for PowerPC processors.
On my first encounter with the MIPS architecture, some 15 years ago, I did not take account ofthe CPU write buffer. This allows the CPU to group write operations (stores) to the external memory interface to create a write burst which can occur at some time after the store instructions have apparently completed. This is further complicated by the fact that a read operation (load instruction) after a write can complete before the preceding write does. Suffice to say my first device driver was not very reliable until I inserted calls to flush the write buffer at strategic places in the driver code.
6. Ignore cache coherency and alignment when using DMA
Direct Memory Access (DMA) is often used to transfer data to and from high-bandwidth devices such as network interfaces, USB controllers, etc. While it’s a great way to relieve the processor of the burden of transferring data, in conjunction with the ubiquitous data cache it does present some traps for the unwary.
The main one of these is cache coherency, that is to say the fact that the contents of the data cache may not always match the contents of main memory. For example, when the processor reads then writes the same location in memory, the write cycle may mean that the cached copy of the data is more up-to-date than the copy in main memory.
Since DMA hardware typically has access only to the main memory, if we were to arrange a DMA transfer to write this memory to an i/o device, it would transfer the wrong (stale) data. A similar problem can occur when DMA transfers data from the device to memory: if the cache happens to contain a copy of the data from the same area of memory, it will now be stale as main memory now contains the latest value.
Some hardware, including many systems based on Intel processor architecture, includes “bus snooping” which automatically ensures that cache coherency is maintained even when DMA is used. In systems that don’t have bus snooping hardware, the device driver will need to manage the cache coherency itself. This usually means flushing the relevant areas of cache to memory before using DMA to transfer data from memory to the device, and invalidating the cache after using DMA to transfer data from the device into memory.
When you are designing a DMA-based driver, understanding the cache architecture and whether or not the hardware automatically maintains cache coherency is a must. If your driver is responsible for this, you also need to take care that the buffers used for DMA transfers are aligned with the cache line boundaries (typically 32 bytes), otherwise some very tricky and hard-to-reproduce bugs can result.
For example, on one project we were debugging an intermittent problem in a USB transfer from a sensor that provided data to the host from an isochronous endpoint at high speed. Most of the time this would complete correctly but on occasions the transfer would terminate early. Eventually we discovered that the problem occurred because we had not properly aligned the buffers into which the USB host controller was DMAing data: when the driver (correctly) invalidated the cache region after the DMA transfer, it corrupted a driver control variable which happened to be in the same cache line as the first data buffer.
7. Do not use the correct data sheet for the device
It sounds pretty obvious: why would you not use the correct data sheet for the device your driver controls? Suffice to say that you need to be sure that the data sheet you’re using as a reference describes the exact version of the device on your hardware. Often it may be easier to obtain a data sheet for a supposedly software-compatible device, but that may mean you miss crucial information if the parts in question don’t behave in exactly the same way under certain conditions. Also, be sure you have a copy of the device errata notes, which can save some head-scratching if the device does not behave exactly as the data sheet describes.
In one BSP development project, we had a hard time getting the main SDRAM memory to work reliably. After double- and triple-checking all our settings for the processor’s built-in memory controller, which were correct, we belatedly looked at the errata notes: they showed an otherwise undocumented register that had to be set to a specific value so that the SDRAM interface signals were driven correctly. Once we included this within our initialisation sequence, the memory interface worked faultlessly.
8. Forget about access serialisation
Device driver code is usually executed in at least two contexts (threads) and often more, one of these contexts normally being an interrupt handler which can pre-empt the other contexts. This gives plenty of scope for re-entrancy problems where the code accesses shared resources such as state variables, device registers, buffer memory, etc.
Serialising access to these shared resources is critical to the correct operation of a device driver, otherwise there can be subtle race conditions that can cause failure or data loss. These are often very hard to debug as the problem may only occur once in a blue moon (although arranging to demonstrate your system to an important customer usually does the trick…).
The operating system your driver works with should provide methods for serialising access to shared resources. This could include mutexes to serialise between non-interrupt threads and interrupt locks or spin locks to serialise between non-interrupt and interrupt threads. A good driver design will minimise the need for serialisation, but will use these mechanisms where necessary.
9. Assume that task-level code cannot run while an interrupt is being handled
Assuming that an interrupt service routine prevents any task-level code being executed can often simplify code that uses shared resources such as device registers and/or state variables. While this assumption is true on single-processor systems, it may not be true on multi-core systems; this is something to be wary of if you are porting an existing driver onto a multi-core system, where it may cause race conditions making the driver unreliable. The right approach in this case is to use spin locks (or other mechanisms provided by the operating system) to serialise interrupt- and task-level code.
10. Use empty loops for short delays
Back to basics here! It is so tempting to use empty loops when a delay of a few microseconds is needed, but they can be the source of many problems. For example, a reasonably good compiler may remove them altogether when optimisation is switched on (unless you pepper the code with “volatile” keywords). More worryingly, the time it takes to execute the loop is very hard to predict accurately from first principles and may vary enormously depending on obvious things such as the processor clock speed, less obvious things such as the memory bus width and speed, and fairly subtle and obscure things such as code and data alignment and interactions with the cache. Even if your loop gives the right delay on the current hardware design and software build, it is an accident waiting to happen if any part of the design changes.
A much more reliable approach is to use free-running counter/timer hardware to generate short delays. In that case the delay will depend only on the counter clock rate and the number of “ticks” you choose to wait for, so it can be predicted accurately.
One case where software timing loops may be acceptable is for quick diagnosis of (and work-around for) race conditions while debugging – so long as they get removed once a proper solution is found.
How your embedded system interacts with the real world will depend on the quality of the device drivers. It is fair to say that writing device drivers takes a clear head, a thorough understanding of the hardware and a structured approach. Hopefully, this article will help you avoid the obvious pitfalls and help you understand the source of problems should they occur.