I was fighting a very similar issue today. I am not familiar with the fencing you are using, but I would guess your fence device is not working properly. If a node fails and the fencing doesn’t succeed it will halt all gfs activity. If a clustat shows both nodes and the quorum disk online, but no rgmanager try running a fence_tool leave and fence_tool join on both nodes. That worked for me today.
Starting one node with the other node down is failing because it is trying to fence all nodes not present before proceeding. I am testing clean_start=”1” in the cluster.conf. It has worked well so far. I would definitely read the man page for fenced about clean_start before using it. It does have some risks.
IMPORTANT NOTICE: This e-mail message and all attachments, if any, may contain confidential and privileged material and are intended only for the person or entity to which the message is addressed. If you are not an intended recipient, you are hereby notified that any use, dissemination, distribution, disclosure, or copying of this information is unauthorized and strictly prohibited. If you have received this communication in error, please contact the sender immediately by reply e-mail, and destroy all copies of the original message.