[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] GFS failover and resilience



Hi,

 

 

Some questions about GFS behavior during a node failure in a 2 nodes cluster:

 

  1. Upon a node failure, the remaining node is held off from any metadata changes until the failed node is fenced and its journal is replayed.
    1. The remaining node won’t be able to open or create any new files, correct?
    2. Would the remaining node be able to read from files opened before the failure?

                                                               i.      What if the files were opened by the failed node at the time of the failure?

                                                             ii.      Would the answers above change if the files were opened with O_DIRECT?

    1. Would the remaining node be able to write to files opened before the failure?

                                                               i.      What if the files were opened by the failed node too?

                                                             ii.      What if the data to be written is bigger than GFS’ “resource group”? Would the node be able to get a new resource group to continue writing?

                                                            iii.      Would fsync to the file work?

                                                            iv.      Would fdatasync to the file work?

                                                              v.      Would the answers above change if the files were opened with O_DIRECT?

 

  1. I am concerned on the long time it might take to GFS to perform fsck. We have experienced very long fsck times.
    1. Is GFS fsck journal based?
    2. How long is it expected to take?
    3. Do you  have any information about GFS fsck and the means to tune it?

 

Thanks a lot,

 

Samuel.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]