The kernel lock validator
Ingo Molnar has announced the first release of his "lock dependency correctness validator" kernel debugging feature.
From the announcement:
The lock validator "observes" and maps all locking rules as they occur dynamically (as triggered by the kernel's natural use of spinlocks, rwlocks, mutexes and rwsems).
Whenever the lock validator subsystem detects a new locking scenario, it validates this new rule against the existing set of rules. If this new rule is consistent with the existing set of rules then the new rule is added transparently and the kernel continues as normal. If the new rule could create a deadlock scenario then this condition is printed out.
When determining validity of locking, all possible "deadlock scenarios" are considered: assuming arbitrary number of CPUs, arbitrary irq context and task context constellations, running arbitrary combinations of all the existing locking scenarios. In a typical system this means millions of separate scenarios. This is why we call it a "locking correctness" validator - for all rules that are observed the lock validator proves it with mathematical certainty that a deadlock could not occur (assuming that the lock validator implementation itself is correct and its internal data structures are not corrupted by some other kernel subsystem).
Furthermore, this "all possible scenarios" property of the validator also enables the finding of complex, highly unlikely multi-CPU multi-context races via single single-context rules, increasing the likelyhood of finding bugs drastically. In practical terms: the lock validator already found a bug in the upstream kernel that could only occur on systems with 3 or more CPUs, and which needed 3 very unlikely code sequences to occur at once on the 3 CPUs. That bug was found and reported on a single-CPU system (!). So in essence a race will be found "piecemail-wise", triggering all the necessary components for the race, without having to reproduce the race scenario itself! In its short existence the lock validator found and reported many bugs before they actually caused a real deadlock.
If you would like to help test kernel for lock ordering bugs, this patchset can be downloaded from http://people.redhat.com/mingo/lockdep-patches/.
In the meantime, the patchset has also been included in Andrew Morton's -mm patch series (starting with 2.6.17-rc5-mm1).
- Add new comment
- 2400 reads





And the winner is...
====================================
[ BUG: possible deadlock detected! ]
------------------------------------
kseriod/133 is trying to acquire lock:
(&ps2dev->cmd_mutex){--..}, at: [<7846b4e8>] mutex_lock+0x8/0x10
but task is already holding lock:
(&ps2dev->cmd_mutex){--..}, at: [<7846b4e8>] mutex_lock+0x8/0x10
which could potentially lead to deadlocks!
other info that might help us debug this:
4 locks held by kseriod/133:
#0: (serio_mutex){--..}, at: [<7846b4e8>] mutex_lock+0x8/0x10
#1: (&serio->drv_mutex){--..}, at: [<7846b4e8>] mutex_lock+0x8/0x10
#2: (psmouse_mutex){--..}, at: [<7846b4e8>] mutex_lock+0x8/0x10
#3: (&ps2dev->cmd_mutex){--..}, at: [<7846b4e8>] mutex_lock+0x8/0x10
stack backtrace:
<78105572> show_trace+0x12/0x20 <78105599> dump_stack+0x19/0x20
<7813920e> __lockdep_acquire+0x54e/0xe00 <78139f2a> lockdep_acquire+0x7a/0xa0
<7846b3c9> __mutex_lock_slowpath+0x49/0x160 <7846b4e8> mutex_lock+0x8/0x10
<7834497b> ps2_command+0x3b/0x3c0 <7834ad22> psmouse_sliced_command+0x22/0x70
<7834f471> synaptics_pt_write+0x21/0x50 <78344736> ps2_sendbyte+0x46/0x120
<78344a29> ps2_command+0xe9/0x3c0 <7834ae8d> psmouse_probe+0x1d/0xa0
<7834c537> psmouse_connect+0x137/0x200 <78341649> serio_connect_driver+0x29/0x50
<783419b6> serio_driver_probe+0x16/0x20 <782a8fb4> driver_probe_device+0x44/0xd0
<782a9048> __device_attach+0x8/0x10 <782a8563> bus_for_each_drv+0x63/0x90
<782a90a6> device_attach+0x56/0x60 <782a868e> bus_attach_device+0x1e/0x40
<782a7763> device_add+0x113/0x180 <7834299d> serio_thread+0x1cd/0x2bb
<781322c6> kthread+0xc6/0xca <78101005> kernel_thread_helper+0x5/0xb
The kernel lock validator
does this happen with -mm2 still? If so can you mail this to lkml or to Ingo & me (Arjan) ?