[linux-lvm] Backup costs (was: LVM reimplementationre)

Jesus Manuel NAVARRO LOPEZ jesus_navarro at promofinarsa.es
Thu Feb 7 07:54:02 UTC 2002


Petro wrote:

> On Thu, Feb 07, 2002 at 11:34:08AM +0100, Jesus Manuel NAVARRO LOPEZ wrote:
> 
>>Hi, Petro:
>>Petro wrote:
>>
>>>On Thu, Feb 07, 2002 at 10:00:45AM +0100, Jesus Manuel NAVARRO LOPEZ wrote:
>>>
>>[...]
>>
>>>>Well... let's consider all aspects.  I'm a sysadmin the kind of BOFH, so 
>>>>late in the evening I usually find myself a bit overloaded on beer. 
>>>>Specially on friday, if I have to stay at work past 5PM I have the 
>>>>irresistible temptation to go to the closet and piss*1 on the 
>>>>diskcabbinet.
>>>>
>>>   Good way to guarentee you won't have kids. 
>>>
>>Fair enough.  I *don't* have childrens... but I tend to consider my PFY 
>>like a bastard of my bastardness, does it counts?
>>
>     
>     That was intended to be humorous. Urinating on live electrical
>     components tends to be a shocking experience.
> 


Yep.  Me too.  I neither have childs nor PFY, so it really doesn't matter.


> 
>>>>For a backup policy you *must* take appart the media from the on-line 
>>>>data to be protected.  Having all your backup media in a single place is 
>>>>*BAAAAAAAD* idea (TM).
>>>>
>>>   Depends on your needs.
>>>
>>Yes: depends on my needs: If I need to recover any data, having *all*
>>your backup media in a single place is *BAAAAAAAD* idea (TM).  If this 
>>
> 
>     No, when you need to recover stuff, it's a *great* idea to have it
>     in one place.
> 


Well, but this is not what I said.  I purpousely bolded *all*, for 
that's the key.  It's a great idea have *some* backups at hand and 
grouped (if only for the chance of your boss asking you for recovering 
that porn... err... that paper he accidentaly deleted).  But not *all* 
your media.  Obviously is not too operative to have all my backup media 
in Dallas (I'm in Spain so it's not cost-effective).


>     When you suffer an environmental calamity, it's a bad idea. 
> 
> 
>>place is the same (or near to) the place where the production data is 
>>stored is simply unadjectivable.
>>
> 
>     Ever tried to push tens of gigabytes over a WAN? 
> 


Ever heared about "Never understimate the bandwith of a wagon full of 
tapes"?


> 
>>>   Depends on your needs.
>>>   I have one database that changes fast enough that if it's 36 hours
>>>   old, we're basically just recovering it for the table structures. 
>>>
>>In case of disaster, if your backup media is in the proximities of your 
>>production database (define "proximities" as needed) you won't be able 
>>to recover table structures.  One thing is what I say, and a *completely 
>>different* issue is to decide *what* to backup, and how, not where.
>>
> 
>     That wasn't my point. 
> 
>     My point was that for some stuff, 36 hour old data is useless, and


Then its *value* past 36 hours is... nihil.  You told (implictly) that 
your database schema was of some value even if they were more than 36 
hours old.


>     even a normal tape rotation schedule can put data out of reach for
>     10 or 12 hours minimum. 
> 


Yes, that's *potentially* true.


> 
>>>   I've got another one that changes fast enough that it's not worth
>>>   backing up. If it's more than 2 hours old, starting from scratch is
>>>
>>So, the amount of change by unit time is your key to decide what you 
>>*need* to backup and what you don't?
>>Sound *extremly* odd to me.  I would say it should be *the value* of the 
>>material (this include the cost to recreate it anew too, obviously), not 
>>its change rate.
>>
> 
>     Not the necessity of backing up, but the cost. If the data is
>     changing that fast, it could easily be that by the time it's on
>     tape, it's out of date and effectively useless. 
> 


Again: it's not its change rate but its *value*.  The more it values, 
the more you can expense to "insure" it (part of your insurance policy 
talks about *within a time frame*).


> 
>>>   It the first case, off site backups don't make sense, so we have 2
>>>   backup hosts (seperated by about 10 feet currently, less in a day or
>>>   two) that get backups on an alternating (daily) basis. 
>>>
>>They won't make sense deppending on its *cost*, not its change rate.
>>
> 
>     No, it would be a lot cheaper to dump the dbs to tape, and carry the
>     tapes offsite, but (1) recovery time is almost tripled, and (2) 
>  


...and its *value* once recovered will be lower than having no data at 
all.  Again, *if* you manage to find a method so the value of the 
recovered data is higher than the costs of having that method in place, 
your job (if that's your job, of course) is point it out and implement it.


> 
>     And yes, we know the problems with this. It's a calculated risk. We
>     can't afford geographically seperated facilities right now. 
> 


*Value* again.  And about it, I recently knew about a multinational 
company (so it were not only a one-site company) which main office was 
at the twin towers.  It would be able to restore from the people death 
(though *many* of upper management died) but it didn't from the 
data/facilities loss.


> 
>>>   Again, your backup strategy depends on your needs, your budget, and
>>>   your risk tolerance.


It only deppends on your needs.  Your needs can include not surpassing 
certain budget amount, but definetly it hasn't nothing to do with "risk 
tolerance".  "Risk tolerance" is either a winner bet or a misinformation 
issue.

 
>>>   It doesn't make sense to spend $10k for a backup solution for $20k of
>>>   data. It does make sense to spend $10k to backup $100k.  
>>>


Plainly true... except for the last value: it makes sense to spend $X at 
most to backup $Y*p, where p is the probability of loosing that data 
(ie. if the probability is 1, so you're certain to loss the data, you 
can expend up to $100k to insure it -within the time frame that data 
produces $100k revenues).


>>Of course.  Your backup strategy surely deppends... *on the data value* 
>>and only on this.  From the very beginning I stated that for "home data" 
>>
> 
>     No, it doesn't. It depend on several things:
> 
>     (1) Value of data.


Plain data value

 
>     (2) Cost of downtime.


Data value too (in terms of lost revenue for the time the data is not 
accesible).

 
>     (3) Rate of change. (If your data set is completely worthless after
>     24 hours, but worth several million for the first hour, offsite
>     backups don't necessarily make sense etc.) 
> 


Data value too (in terms on how the value of data evolutions with time).


>     And probably some more I haven't thought of. 
> 


Probably: and they all will be expresable in terms of data value or will 
have no significance at all.
-- 
SALUD,
Jesús
***
jesus_navarro at promofinarsa.es
***





More information about the linux-lvm mailing list