PARSE Software Devices

Rob Krten's Home Page
· Resume (HTML) (Word)

Technical Articles
· Understanding Software Development
· Realtime Enough
· QNX Neutrino Timer Tips
· Kicking the Resmgr Habit
· Boot Loader Speedup
· Priority Inversion
· Device Drivers
· Tiny OS
· USENET News
· Text Retrieval
· Fractal Terrain
· 3 QNX Books

Presentations
· Home Monitoring
· Equity/Option Management

Ideas
· Chordotron Keyboard
· Saving Electricity
· How much electricity?
· Extending QNX Neutrino

Projects
· DEC Module Storage
· 8x8x3 Audio/Video Switch
· 50 Pole Double Throw Switch
· PR-68 Paper Tape Reader
· The Back Server Room

Hobbies
· Old Computers

Short Stories
· Mine!
· Ed Cleans Up
· An Unusual Sound
· The UCN
· The Gruntbox
· What They Wanted
· The Go Beep Unit
· Octavia

Pet Peeves
· Metric Mangling
· Idiot Spammers
· Mass Media
· Illiterati
· Virus Detection

Bullshit!
· The Psi-Meson
· Software Development


Miscellaneous
· Contact Me
[NEXT]

Tips, Tricks and Techniques to Tame Timer Trouble


First published on the PARSE Software Devices website September 18th, 2006 © Copyright 2006 by Robert Krten, all rights reserved.


Synopsis

This article discusses several cases of timers as used in a large QNX Neutrino project. The author's experiences with the evolution of the use of timers in this project is highlighted.

It seems so simple...

The project is a control program for a piece of hardware. The main program is an event driven set of state machines that performs actions based on various events (such as I/O points changing, timers expiring, other modules completing tasks) arriving into the main state machine.

It's in this context that I designed the initial timer functionality for this control program. Three simple calls were provided:

  • arm a timer in milliseconds (as a one-shot or as an auto-reload),
  • cancel a timer, and
  • tell the timing system to evaluate its timers.

The basic idea was that when the control program required a timer (say for determining that a piece of hardware did not get to a desired state within a certain period of time), it would simply arm a timer, giving it the number of milliseconds from "now" when it should fire, and a 32-bit timer ID. Then, the control program would return to its blocking point (a MsgReceive() call).

The control program had a regular heartbeat "tick" that would generate a pulse every 10ms or so, and that was used for tickling the software watchdog process, as well as alerting the timer subsystem that it was time to check the timer chains to see if any timers had expired.

If no timers in the chain had expired, nothing happened. If a timer had expired, a pulse would be generated back to the control program, and would eventually be handled the next time the control program hit its MsgReceive() rendezvous point.

It seems so simple, and yet there were many problems with this.

In this article, I'll discuss two scenarios with timers and blocking calls, and examine the problem in depth as well as the solution.

The First Implementation and Problem

The first implementation of this timing system is the simplest. Every time that the control program's "heartbeat tick" occured, we assumed that 10ms had gone by -- after all, the heartbeat tick was controlled by a timer_settime() function, and it was programmed to 10ms. (The minor quibble about whether the kernel's timing system would give us exactly 10ms, and not something like 9.999ms or 10.001ms was irrelevant -- the things we were timing were in the hundred milliseconds to tens of seconds range, so a few microseconds either way was unimportant.)

Well, for a long time, this worked, or at least seemed to.

Occaisonally, there were unexplained events, where it looked like the hardware had failed -- from the point of view of the control program, it looked like the hardware didn't reach a particular state within its alloted time because its associated timer had "popped" (timed out).

These events were rare, and thus were prioritized at the bottom of the work queue. When we finally got around to analyzing the problem, it turned out that time was "running too fast". This was a real head scratcher. Surely, every time the timer ticked, 10ms had gone by, so therefore, simply subtracting 10ms from each timer in the timer chain would be the correct thing to do, no ifs ands or buts about it, right? Well, that's the way it was designed, but not the way it turned out to work "in system".

Consider the following sequence of operations:

  1. display message to operator
  2. arm timer for 2 seconds
  3. trigger hardware
  4. go back to MsgReceive()

One of the assumptions of the system is that all function calls are virtually non-blocking. This means that when we issue the function call to display a message to the operator, it might block for a few hundred microseconds, but it certainly wouldn't go away for "a long time" (e.g. tens of milliseconds). However, like most cookbook :-) implementations using QNX Neutrino, this was a deeply-blocking system (e.g., A sent a message to B and blocked, B passed the request on to C and blocked, C performed the work, then unblocked B, which then unblocked A. Effectively, A was "deeply blocked" on C).

The practical impact of this is that after we called the function to display a message to the operator, we then armed the timer for 2 seconds (which really only put the value 2000 into the timer's timeout field). Due to the implementation, the function to display the message may indeed have blocked for a long time (without getting into the gory details, there's a serial protocol involved with timeouts and retries).

... continued on page 2...
[NEXT]

Ads by PARSE

QNX Consultant

Need help with your next QNX project? Expert with 20+ years experience, available now!


Contact me This page was updated on Tue Jan 6 00:15:04 EST 2009 © 2000-2008 by Robert Krten.
All rights reserved.
I, Robert Krten, on this day, 2005 08 30, hereby release my DNA to the public domain for non-commercial use in order to prevent it from being patented. Help Percy Schmeiser with his fight against Monsanto!