Registered: Apr 2000
Write Caching & Journaling
For performance benchmarks, some of the new drives have write-back caching by default. This means the drive reports a write is completed before it is actually on the media. The block is still in the drive's cache, where the writes can be reordered. If this happens, metadata changes might be written before the log commit blocks, leading to corruption if the machine loses power. It is very important to disable write-back caching on both IDE and SCSI drives.
Some hardware RAID controllers provide a battery-backed write-back cache that preserves the cache contents if the system loses power. These should be safe to use, but the cache battery should be checked often. A dramatic performance increase can be seen with these write caches, especially for log intensive applications like mail servers.
Basicly the problem with buffering writes on a journaled filesystem w/o battery backup of the buffer/cache is you could still get into an inconsistent state.
The simplified premise is as follows:
if power fails while writing, when the system comes up it sees a non completed success on a transaction and rolls the change back.
the problem occurs if you say "hey I wrote the journal entry" (cachecard) but then don't actually write it, depending on what region of the disk you are caching for, you might end up writing out video (or worse data/metadata [like the stream/disk space usage map) and put mfs in an iinconsitent state.. this example is "proabbly" fixable by fsfix/mfscheck but would proabbly cause a temporary GSoD
My reccomendation would be to write-through if you are going to be doing any caching of data writen to the disks. And only allow cache hits to occur after you are sure the disk has returned from a successful write.
Assuming you don't reorder the writes, I belive this would end up "ok" but I have to think through a couple more scenarios until I'm convinced it's completly safe.
reordering would definetly be more problematic, I can think of a few scenarios where if it wasn't properly handeled you'd attempt a write, have the cachecard buffer and say it completed, get a cache hit to the area that you thought you wrote to, go to perform another write to the same region, have that cached also, if the two got reordered, you could end up with the wrong checksum finally written as meta data, thus making the FS inconsistent.
Obviously proper caching code would handel the above (ie flush cached data for pending writes, and don't reorder writes to the same block)
It may still be possible if you are performing a write across multiple blocks, if you cache and reorder, you might end up in a weird state where you have pending writes that have been reorders, but need to perform a checksum before all the writes have occured, this checksum would be different then the expected one, thus flagging the FS as inconsistent. (Again, proper cache code would proabbly deny cache hits, or read through for pending writes)
Just my random thoughts.
POST #15 | Report this post to a moderator
| IP: Logged