[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[libvirt] [nicsysco.com] Weird Libvirt Behavior



Hi folks,

 

First time contributor, but I felt that what I discovered was (probably) a very rare situation.

 

I’m running a Centos server (my only Linux deployment) to which customers all over the U.S. connect to process their micro-lender businesses. There are several VM’s, among other one which runs the fortress system, called a2. In the beginning the .raw file was about 10GB, which was a 5X overkill in terms of capacity, at the time.

 

For years we had no problems and the CentOS box would tick over day after day without as much as a hiccup.

 

About three months ago a2 started to slow down, almost to the point of timing out when applications and users log on. The band-aid was to copy an earlier a2.raw backup over the current one on a regular basis, and it would rectify the problem. At first applying this band-aid on Sunday nights, would suffice. But, later we had to increase it to twice a week and these last couple of weeks we had to do it almost every night. The system also sent alerts that a “Degraded Array event had been detected on md device /dev/md1”. Inspecting the drives showed no crisis.

 

Today it folded completely and brought the system down, with clients’ “our computers are down” response to their customers walking into their stores. Restarting the box just brought a2 to a paused state, never recovering. We had to killall to get rid of it.

 

Having nowhere else to go with it, I decided to rebuild a2 in another, separate drive to at least address the degraded array alerts. As I edited the .xml file, I saw the following:

 

<source file='/var/lib/libvirt/:machines/a2/a2-disk1.raw'/>

 

What the hell was that colon doing there? I checked the size of the .raw file. It has grown to over 96GB. Just to check the sanity-box, I checked the other VM’s .xml files and they didn’t have a colon, as I expected.

 

I removed the colon and virsh-started a2, which fired up immediately, with the rest of the system following suit. No doubt that “:” was the culprit!

 

My question is: Would that colon cause an append-action to the .raw file? We have no idea when it got in there or how. We haven’t worked on that xml file for a long time. Why would a2 even fire up at all?

 

It would be great to hear what the guru’s think about that…

 

Thanks

 

Nico van Niekerk

Agoura Hills, CA 91301

 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]