the problem is that if the buffer was locked on entry to this code sequence
(due to in-progress I/O), ll_rw_block() will not wait, and start new I/O. So
this code will wait on the _old_ I/O, and will then continue execution,
leaving the buffer dirty.
It turns out that all callers were only writing one buffer, and they were all
waiting on that writeout. So I added a new sync_dirty_buffer() function: