[Ovirt-devel] [R&D] Breaking the Browser

mark wagner mwagner at redhat.com
Mon Jul 7 02:26:43 UTC 2008


Note sort of a combinational reply to several responses in this thread....


Jeff Schroeder wrote:
> On Fri, Jul 4, 2008 at 7:33 PM, mark wagner <mwagner at redhat.com> wrote:
>>
>> Jeff Schroeder wrote:
>>> On Thu, Jul 3, 2008 at 2:14 PM, Jason Guiditta <jguiditt at redhat.com>
>>> wrote:


>> So if your job is to monitor the 25 hosts in your pool and take immediate
>> and
>> effective action to mitigate any performance or catastrophic issues in said
>> pool within one minute and 26 secs (3 nines, assuming this is the only issue
>> that day) of the event, how are you going to monitor your pool ?
> 
> That is what monitoring software is for. If you expect someone to _look_ at
> a webgui 24/7 you are approaching the problem from the absolute wrong angle.
> 
 >> If we don't provide the ability to notify immediately when a host goes down,
 >> you are rapidly eating into time specified to resolve a problem.
 > You are agreeing with me on this one. This is what monitoring software is for,
 > not a website. Something like nagios could call a pager or run a script to do
 > something much faster than a human could.
 >

So I guess the question to people is why not just make the decision to switch
over to Nagios and something like Cacti now ?

We all seem to agree that we need the Nagios type of functionality, Cacti or
something should be able to solve our graphing requirements as well.


>> We need to look at making the nav bar have near realtime capabilities.
>> I need to know if a system is getting close to capacity (change the color
>> of the icon?) or is offline.
> Changing the color of the icon would be good, but the admin should be able
> to set threshholds. When X reaches X%, send alert to foo at bar.com or run
> script /usr/local/bin/foo-alert.sh
> 

>> How do the admins get notified as quickly as possible?
>> Do they sit in front of a terminal and hit refresh on there browser?
>> Set their mail clients to fetch every 10 secs ?
>> This is clearly a case where time is money, if you are in a big NOC and
>> miss your SLA it could cost big bucks, not to mention a job or two.
> 
> Mark, without causing any bad blood, this conversation is basicly over. You
> agree with me that notification should happen as soon as possible. You disagree
> with me on how it should be done. For true autonomy, the human aspect
> needs to be taken out of the picture as much as possible. This is why I'd
> argue a script should automatically do alerting and NOT a human. What
> the human does with that alert is up to the business.
> 

Jeff, no bad blood thoughts here, this is an open, professional discussion
aimed at making ovirt as good is can.  My goal is to provoke discussion
to make sure that we have considered all angles that we can and the
ramifications of any decision.

I agree that the human interaction needs to be minimized as quickly and as
much as possible.  However, I don't know that anyone is even considering
how to add that at this point in time. (someone jump in if I'm wrong here)

I still think there will be cases where things are not automated and some
interaction needs to get done by hand.  Thus, even though we agree on
functionality we need, I'm not sure that the discussion should be over yet.

For instance, I think that we should see what it takes to update the
NavBar as soon a possible with state changes.  If I start a guest or host
manually, I would want to see when it is available ASAP w/o the need for
my pager to go off or an email to arrive. The logical place to me is in the
NavBar.

My logic is that if I'm starting it via the WUI, there is a reasonable
chance I'm going to do something else with it. In the case of the host,
I may move some guests to it. As a user, if there is no immediate notification,
I'll be hitting refresh to see when its available so I can continue my work.

I will admit that I do not work as an admin. I do sometimes play one when
running performance tests, etc. Our solution is very low-tech, we poll
until the systems are up and then monitor the tests. If "industry standard
practice" for a NOC is to use pagers then my argument may be irrelevant
to our "target audience" and us lab weenies can wear out the refresh button :)

-mark




More information about the ovirt-devel mailing list