We've set up recently a rhel 5.4 cluster of 3 nodes for a Moodle high-availability website, where the sessions and data are share in a GFS2 volume.
We found that, while the read performance have been constantly good, there is a problem with writes, as the system decrease its peformance after some conditions. We think that it can be related with our backup procedure:
We do an NFS export of the GFS2 volume from one of the nodes, so that we can backup the volume every night externally, from a veritas backup client. After that, we find next morning that the write performance has decreased a lot, so that it is practically unusable for some big files and for the operation of cloning an existing course (zip, unzip the data of the course in a new folder). After some experiments with the writes and clone operations, we have found a way that improve the issue, but we think that there should be a better way. What we did was to add the next entry to crontab in every node:
0-59/10 * * * * sync; echo 3 > /proc/sys/vm/drop_caches
So that the lock caches are cleaned every ten minutes, as we didn't notice that it affects badly to the performance of the system, and effectivily, it improves the write performance somehow, at least making it usable.
Do you think this could be an option? Do you have a better explanation for it? Any other ideas what could we do?
We have been since then having problems with apache service, being stopped sometimes (not very offen) in one of the nodes. I think that it could be related to this maintenance of the vm caches... but I'm not sure.