Device Drivers and Real-Time Systems
This article originally appeared in Doctor Dobb's Journal, October 1998
In this article, I'm going to examine two radically different device
drivers, and their implementation under the QNX 4 realtime operating
system. The focus of this article is to illustrate realtime device
driver issues that come up in the real world.
When writing drivers for realtime operating systems, such as QNX 4, you
will encounter a number of challenges. We'll examine two of these
challenges; latencies, and timing accuracies.
Latencies
What makes a realtime operating system "real time" is its ability to
respond to events in a deterministic (and hopefully fast!) manner. The
amount of time this response takes is called the latency. There are
several types of latencies. In this article, we'll be looking at
interrupt latency (the amount of time that elapses from the hardware
raising an interrupt, to the execution of the first instruction of the
ISR), and scheduling latency (the amount of time that elapses from a
particular process being made ready to execute, and the execution of
that process). While both latencies are important in realtime systems,
the crucial one is the interrupt latency. This is because the source of
the interrupt, the hardware, usually has no buffering. If you miss the
interrupt, the data is gone (e.g. a network card getting data from the
network). If the scheduling latency is long, this can, to a degree, be
compensated for in the ISR. In this case, however, the ISR is more
complex, because it has to effectively buffer up the data. In extreme
cases, the ISR can get *very* complex, because it may have to respond to
the ultimate source of the interrupt (e.g., a network card ISR may need
to send back acknowledgements within a short amount of time, otherwise
the other end will time out. This means that the ISR must be intimately
aware of the protocol, and perhaps have access to a lot of data
structures from the corresponding process). With a short scheduling
latency, you can defer processing to the controlling process, rather
than doing it in the ISR.
Timing Accuracies
Often, the issue of "how accurate is the timing" is neglected, and comes
back to haunt the designer after the initial design has been completed.
As we will see in the article, consideration needs to be given to an
analysis of the timing requirements right up front, during the design
phase.
The Drivers
The first driver communicates with a BSR TW-523 X-10 controller in order
to provide access to various X-10 modules that I have around the house.
These modules allow you to perform functions such as controlling lights
and appliances, by using the existing 110 VAC wiring in your house.
The second driver is for a home-built sound card for my PC, which has a,
shall we say, "unique" architecture.
The X-10 Driver
Let's start with the X-10 driver. The BSR TW-523 controller ("X-10
controller") presents a simple interface to the PC -- it has four wires:
common, TX, RX, and Zero Cross, and also a 110VAC plug. The idea is
that whenever the 110VAC changes polarity, the Zero Cross line will
change state. Effectively, this presents the 110VAC line via an
optically isolated square wave. According to the X-10 protocol, data
must be transmitted immediately after a zero crossing of the AC has been
detected. This data is transmitted by asserting the TX pin for one
millisecond (if transmitting a "one"), or doing nothing (if transmitting
a "zero"). When the TX pin is asserted, the X-10 controller generates a
120 kHz carrier on the AC line. Other devices listening on the AC line
synchronize their reception of this carrier to the zero crossing --
quite a clever design.
There are two main software challenges in interfacing with this device:
the response time required upon detection of a zero crossing, and the
accuracy of the 1 millisecond pulse that needs to be generated.
When I built the hardware interface for the controller, I chose to use a
standard RS-232 serial port. This was the easiest way I could think of
to get interrupts from the zero crossing line. I then tied the DTR line
to the TX pin, so that I could raise and lower it via software control.
(We'll ignore the TW-523's RX pin in this article).
Now the design decisions. How much work should I do in the ISR versus
the process level? This is a common tradeoff -- the ISR, while running
with the minimal latency after the time that the hardware interrupt was
asserted by the hardware, is generally a much more "sensitive"
environment. This is due to a number of reasons: ISRs generally have
access to all of the I/O ports (on x86 processors) and can wreak havoc
with other hardware devices; the amount of time spent inside of the ISR
has a direct, negative impact on process scheduling; and finally, since
the ISR isn't a real "process", it is generally limited in the number of
kernel calls that it can use. On the other hand, deferring processing
until "process" time, while avoiding the pitfalls of the ISR, can lead
to unacceptable latencies under some operating systems.
So, what to do?
Let's look at both methods.
Doing the Work in the ISR
The actual work that needs to be done in the ISR for this example looks
very minimal. After all, when the interrupt hits, we jump into the ISR,
look at a circular buffer (containing the data that some client process
wants us to transmit), and, if there is a "one" to be sent, we assert
DTR. That part is no problem. I'd be amazed if this took more than 5
lines of C. However, once we turn on DTR, we need to be able to turn it
off 1 millisecond later. Depending upon what type of operating system
you are using, this may range from a few lines of C to a few dozen lines
of C -- somehow you tell the O/S to schedule a process to run, and the
process starts a 1 millisecond timer. When the timer fires, the process
deasserts DTR. Under QNX, this is done by returning a non-zero value
from the ISR itself. The kernel picks up the ISR's return value, and
affects the scheduling queue.
Let's flip over to the other method, and then we'll look at the 1
millisecond issue.
Doing the Work in the process
By doing the work at process time, rather than within the ISR, the only
thing that's changed is when/where we do the circular linked list
management.
Since this example is interrupt driven, we still need an ISR, and we
still need to clear the source of the interrupt (on the serial chip I'm
using, this involves two I/O port reads). Then, we need to tell the
kernel to schedule a process as a result of the ISR.
Doing the work in the ISR directly is more efficient, because the ISR
(having access to the circular buffer) can determine whether or not it
needs to tell the kernel to schedule a process. The "process" method
requires that the ISR schedule the process every time, since the ISR has
no idea of what data should or should not be sent.
So why would I do it in the process? Simple -- because I can. It is
*MUCH* easier to debug things in the process: I can use source-level
debug profiling tools (though my personal favourite is the printf()
debugger). Also, in this case, I'm only getting interrupted at 120 Hz.
This low interrupt rate is not an issue.
The real issue is, "How long does it take to get there?" Under QNX 4,
running on a Pentium 100 MHz processor, it takes 1.8 µs to run the
first line of the ISR, and another 4.7 µs after the ISR has exited to
run the first line of the "process". These numbers were obtained
directly from QNX Software Systems, and are stated as being "typical".
So, can I afford 6.5 µs of delay? Well, where's the 110VAC sine wave,
6.5 µs after the zero crossing? Only 417 mV higher (or lower), or
about 0.12% of the range. Probably not significant.
What do you mean by "hard" and "soft"?
I quoted two numbers; the 1.8 µsec ISR latency, and the 4.7 µs
"scheduling latency". While both numbers are equal in importance, one
number is a little more equal than the other :-) The ISR latency time
will ONLY be affected by one of two things: a process or ISR that has
interrupts disabled; or a higher (in terms of hardware priority) ISR.
Since most realtime architectures (and programmers, for that matter) try
to disable interrupts for the smallest possible amounts of time, and run
the ISR's for the least amount of time, statistically speaking we should
have a good success rate with this 1.8 µs number.
Now, what about the 4.7 µs number? This number is, first of all,
AFTER the ISR has completed execution, and secondly, is affected by the
priority of the process. The ultimate decision as to whether this is
good or bad depends upon the system designer -- whoever decided at what
priority things should run. If you NEED to attain the 4.7 µs number,
then the process should run at a higher priority that other processes,
period.
The 1 millisecond issue
Regardless of where the DTR pin was asserted (in the ISR or in the
process), most operating systems require you to do timing functions
within a process. (We certainly don't want to spend one millisecond in
an ISR!)
There are, of course, some design issues associated with this. Since
the kernel receives periodic interrupts from some hardware clock, and
indeed bases all of its timing on those interrupts, you cannot delay for
a period of time whose granularity is finer than the base clock tick
rate. For example, if the kernel gets periodic interrupts every ten
milliseconds, you certainly CANNOT reliably delay for anything less than
ten milliseconds. However, it's not as simple as boosting the hardware
clocks rate. Even if we boosted the rate to, let's say, one
millisecond, the next issue that hits us is the fact that the hardware
clock is asynchronous to the process. If the hardware clock has just
interrupted the kernel, and we now tell the kernel that we want to sleep
for one millisecond, we'll get pretty close to a one millisecond delay.
However, if the kernel clock is ABOUT TO interrupt the kernel, and we
schedule our one millisecond delay, we will be woken up much too early.
The best that you can do in this case is to boost the hardware clock
rate so that the "jitter" (the amount of variability in the delay time)
is "acceptable". Even if we boosted the clock rate to 100 µs, we
would still only be able to reliably sleep for between just over 900
µs and just under 1000 µs (1 millisecond). And, of course, we
can't just boost the hardware clock to an arbitrary rate, like 1 µs,
because the kernel wouldn't be able to handle the interrupts at that
rate.
As a side note, I have personally found a hardware clock rate of 500
µs to work just fine for the X-10 application -- as it turns out, the
timing length isn't THAT sensitive.
In this case, however, there is a slightly more elegant solution. Since
what we want is a time source that is SYNCHRONOUS with the assertion of
the DTR pin, why not use the serial port chip's TX pin instead? You
could program the serial port for 9 kbaud, and send it one byte with all
of the bits set the same. The serial port chip will send out 8 data
bits and one stop bit, (9 bits in all), which, at 9 kbaud will be
extremely close to 1 millisecond! Tying the TX pin to the TW-523's TX
pin means that the hardware has effectively generated a nice, clean 1
millisecond pulse for you. (Of course, this occurred to me AFTER I
built the hardware and got it running.)
Can you rely on this as an "external" synchronous timing source? It
depends on your hardware. Certainly, but it depends on your willingness
to modify the hardware such that the TX pin is looped back to a modem
status pin that can generate an interrupt, such as CD.
The Audio Driver
Let's look at something completely different, to illustrate some other
timing issues. About 6 years ago, when sound cards weren't very good
(and were somewhat pricy), I managed to wangle some samples of digital-
audio quality A/D and D/A parts. These parts worked with a serial data
stream, so I designed an IBM-PC/AT compatible ISA card with 4 FIFO chips
on it, and some serial/parallel and parallel/serial conversion circuitry
logic on it. I wasn't quite sure how to work with the hardware
interrupt system, so I left it off for what I thought was the initial
test. To my surprise, the board worked! (and the interrupt circuitry
has stayed off of the board.) So what does this have to do with
realtime?
Let's examine how a FIFO chip works. A FIFO chip has two "sides". In
my case (for the D/A portion), one side is connected to the ISA bus (the
writer side), and the other side is connected to the parallel/serial
conversion logic (the reader side). The reader side is driven by a
steady 44.1 kHz clock -- that's the sampling rate that the card and D/A
convertor operates at. This means that the parallel/serial conversion
logic is reading data out of the FIFO at a fixed rate (44.1 kHz). Since
the FIFOs are 512 bytes (and there are two of them, to make,
effectively, a single 512 word FIFO), this means that the FIFO will go
from full to empty in 512/44100 seconds (11.6 milliseconds).
I realized that I didn't *need* interrupts, and could get away with just
polling the FIFO's "FULL/EMPTY" flag!
All I had to do was fill the FIFO completely, and then I had 11.6
milliseconds where I could do whatever other processing was required.
Here's what I mean. This is the main polling loop in my audio driver:
while (!done) {
// if FIFO is full, go to sleep
if (inpw (FIFOport) & S_FIFOFull) {
delay (1);
} else {
// FIFO is not full, write data
outpw (FIFOport, buf [bufptr++]);
if (bufptr >= BlockSize) {
done = 1;
}
}
}
The decision to call delay (1) (which sleeps for 1 millisecond) as
opposed to calling it with a number closer to 11.6 was made for two
reasons. First of all, I didn't feel comfortable with sleeping until
the FIFO was almost empty -- what if something caused me to oversleep?
Then there would be a "click" in the audio stream as the parallel/serial
logic sucked zero's out of the FIFO! Also, and more importantly, I
didn't want to be hogging the CPU at a high priority for the entire time
that it took to fill the FIFO from a near-empty state. It's much better
to fill it in tiny bursts, as this allows other, lower priority
processes to run more often. This second point may appear moot --
except for the fact that at one point I contemplated buying 16 kbyte
FIFO's instead of the "wimpy" 512 byte FIFO's I had, until I found out
that they were about $50 EACH.
Another consequence of not using an interrupt is that I'm avoiding the
context switch of entering an ISR and scheduling a process.
Note that the code presented above is just an extract, the actual code
has multiple buffers for "buf" that are fetched from disk during the
"idle" time when the FIFO is full.
Real Realtime Timing
I hope that this discussion has enlightened you about some of the issues
that arise during the design of a driver that has to deal with realtime
devices. The key things to keep in mind are: how good is your kernel's
clock granularity; how fast are the context switch times (both into the
ISR, the "Interrupt Latency" number, and from the ISR to the process,
the "Scheduling Latency" number), and finally, are there any good tricks
that you can do in the hardware to offload the software's timing burden?
About the Author
Robert Krten is an independent consultant specializing in realtime
systems design and development work. He has written three books on the
QNX Realtime Operating System, as well as several articles.
He is the president of "PARSE Software Devices", a consulting company
specializing in QNX and realtime projects.
|