Extending RCU for Realtime and Embedded Workloads
This past year has seen significant increases in RCU's realtime capabilities, particularly the ability to preempt RCU read-side critical sections. There have even been some cases where use of RCU improved realtime latency (and performance and scalability as well), in contrast to earlier implementations, which seemed only to get in the way of realtime response. That said, there is still considerable room for improvement, including
- lower-overhead rcu_read_lock() and rcu_read_unlock() primitives,
- more scalable grace-period detection,
- better balance of throughput and latency for RCU callback invocation,
- lower per-structure memory overhead and
- priority boosting of RCU read-side critical sections.
This last item is needed to prevent low-priority tasks from blocking grace periods, resulting in out-of-memory events, due to being preempted for too long while in an RCU read-side critical section.
This paper describes ongoing work to address these five issues, including some interesting failures in addition to a number of unexpected successes. The ultimate goal of providing a single RCU implementation that covers all workloads is tantalizingly close, but is not yet within our grasp. It is safe to say that the very wide variety of workloads supported by Linux provides substantial challenges to the design and implementation of synchronization primitives like RCU!