normally we do use write back cache on the RAID (which is backed up by battery so should be ok...)
but in order to verify this isn't the cause we switched to write-through and it didn't help.
The best behaviour I've seen so far was when I configured DATA="" but even then it's all very timing-sensitive and I still get writes/closes which finish ok but are not on the disk later on...
And I've yet to see a real explanation/coverage of how journalling filesystems in general, and EXT3 specifically, handle disk failure situations like power loss or FC cable disconnection.
Currently our direction is to monitor the loop state (via /proc) and initiate a killall on the application and umount once we see a loop down indication.
A brute-force test of this mechanism seems to work...
Anyhow thanks for the sugggestion, will be happy to continue experimenting.