[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility



Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
> Ok, thanks.  I'm currently testing on your git branch referenced earlier
> in this thread
> 
> git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1

Great.  That's the one.  My current version is pretty much identical -
some unused variables deleted and comments edited.

> >  * You need to fix the timer deregistration arrangements in the
> >    libvirt/libxl driver to avoid the crash you identified the other day.
> 
> Yes, I'm testing a fix now.

Great.

> >  * Something needs to be done about the 20ms slop in the libvirt event
> >    loop (as it could cause libxl to lock up).  If you can't get rid of
> >    it in the libvirt core, then adding 20ms to the every requested
> >    callback time in the libvirt/libxl driver would work for now.
> >   
> 
> The commit msg adding the fuzz says
> 
>     Fix event test timer checks on kernels with HZ=100
>    
>     On kernels with HZ=100, the resolution of sleeps in poll() is
>     quite bad. Doing a precise check on the expiry time vs the
>     current time will thus often thing the timer has not expired
>     even though we're within 10ms of the expected expiry time. This
>     then causes another pointless sleep in poll() for <10ms. Timers
>     do not need to have such precise expiration, so we treat a timer
>     as expired if it is within 20ms of the expected expiry time. This
>     also fixes the eventtest.c test suite on kernels with HZ=100

I think this is a bug in the kernel.  poll() may sleep longer, but not
shorter, than expected.

>     * daemon/event.c: Add 20ms fuzz when checking for timer expiry
> 
> I could handle this in the libxl driver as you say, but doing so makes
> me a bit nervous.  Potentially locking up libxl makes me nervous too :).

I was going to say that the code in libxl_osevent_occurred_timeout
checked the time against the requested time and would ignore the event
(thinking it was stale) if it was too early.

But in fact now that I read the code this is not true.  In fact I
think it will work OK (modulo some things happening too soon).  So the
upshot is that I still think this is a bug in libvirt but I don't
think it's critical to fix it.

Sorry to cause undue alarm.

> Yes.  I've been running my tests for about 24 hours now with no problems
> noted.  The tests include starting/stopping a persistent VM,
> creating/stopping a transient VM, rebooting a persistent VM,
> saving/restoring a transient VM, and getting info on all of these VMs.
> 
> I should probably add saving/restoring a persistent VM to the mix since
> the associated libxl_ctx is never freed.  Only when a persistent VM is
> undefined is the libxl_ctx freed.

Right.  Great.

Thanks,
Ian.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]