[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Disaster recovery, network based tape storage



On Thu, 30 Sep 2004, Ed Wilts wrote:

> > Sigh, what I would like is something a bit more instructional.  I just 
> > hate the idea of using full system backups, yes you can recover but  
> > you have no idea as to the steps required to get RHEL back to its present 
> > state.  I accept that backups are required for "live data" i.e. databases, 
> > non static config data etc... But everything else really should be solved 
> > using an automated installation / configuration engine like kickstart. 
> 
> Unfortunately, in the real world, the live data is what counts a *lot*
> more than the "everything else" you mention.  Installing the OS and
> configuring the apps is fairly straightforward these days - heck, RHCEs
> do it (and more) in under 3 hours to pass the exam.  Putting back all
> the user files, crontabs, passwd and group data (I know, I need ldap!)
> and applying custom configurations for some of the packages (httpd, ftpd,
> etc) take the bulk of the work.

Your real world is obviously different than mine.  Data that changes on 
our servers is more or less a known quantity and when its not known, the 
specific location of the data is well known.  A combination of targeted 
backups i.e. /home, /opt etc.. with a complete system install/ 
configuration routine provided via pxe+kickstart allows us to recover a 
server fully in 20 minutes (limited by anaconda).  The  kickstart scripts 
handles the "applying custom configurations" as well as the restore of 
live data.  The main difference being that if your policy 
is to rely on a full image recovery the history and the knowledge of 
each bit that was twiddled is obscured.  

If you have many RHCE's at a company whos sole job it is to administrate a 
Linux/Unix environment this may not seem like a big deal. Or if your 
company is ok not knowing how a device came to its existing 
configuration as long as a "disaster recoverable image is handy (and off 
site)".  However I dare say that the "real world" isn't full of companies 
with hoards of RHCE's or people with enough skill to replicate some 
random unix/linux servers from scratch without breaking a sweat.

In my environment we have 150 IS staff supporting around 600-700 servers 
(not all unix/linux by any means) and around 10,000 end user nodes (health 
care). Perhaps 5 of them have the skill to accomplish rebuilding one of 
our Linux servers by hand within a few days and not all of these  
are in the same "group".

In other words having a document and scripted procedure that defines what 
each Linux server is and how it is built makes Linux much more accessible 
and supportable by the "real world" IS personal where I work.

For a quick recovery they simply:

1) Plug replacement hardware into the network
2) Boot hardware
3) At the pxe boot screen type: "restore ks=http://dist_server/servername.ks";
4) 20 minutes later the server reboots and is fully restored.

If someone needs to figure out what all a server does they merely check 
the servername.ks and support scripts.  If new staff comes on board who 
wants to get up to speed on what all is involved with our Linux farm 
he/she can go read the kickstart scripts in a day.  More 
importantly those who do know Linux quite well are freed up to work on 
new engineering tasks while management notices that the "non linux 
experts" can manage recovery and simple changes (to the scripts) just 
fine... Linux isn't that hard after all.

If you want an example of an OS that got this perfect take a look at Cisco 
IOS.  Granted the configuration and applications of their os's are simple 
compared to Linux, but you can't beat uploading a .cfg file and calling it 
good.

-- 
"Given enough time, all legal battles in the tech industry will invoke the 
DMCA. This generally means that all constructive arguments have ended." 
					-NialScorva


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]