[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: Disaster recovery, network based tape storage
- From: shane stixrud org
- To: "Discussion of Red Hat Enterprise Linux 3 (Taroon)" <taroon-list redhat com>
- Subject: Re: Disaster recovery, network based tape storage
- Date: Thu, 30 Sep 2004 13:08:22 -0700 (PDT)
On Thu, 30 Sep 2004, Ed Wilts wrote:
> > Sigh, what I would like is something a bit more instructional. I just
> > hate the idea of using full system backups, yes you can recover but
> > you have no idea as to the steps required to get RHEL back to its present
> > state. I accept that backups are required for "live data" i.e. databases,
> > non static config data etc... But everything else really should be solved
> > using an automated installation / configuration engine like kickstart.
>
> Unfortunately, in the real world, the live data is what counts a *lot*
> more than the "everything else" you mention. Installing the OS and
> configuring the apps is fairly straightforward these days - heck, RHCEs
> do it (and more) in under 3 hours to pass the exam. Putting back all
> the user files, crontabs, passwd and group data (I know, I need ldap!)
> and applying custom configurations for some of the packages (httpd, ftpd,
> etc) take the bulk of the work.
Your real world is obviously different than mine. Data that changes on
our servers is more or less a known quantity and when its not known, the
specific location of the data is well known. A combination of targeted
backups i.e. /home, /opt etc.. with a complete system install/
configuration routine provided via pxe+kickstart allows us to recover a
server fully in 20 minutes (limited by anaconda). The kickstart scripts
handles the "applying custom configurations" as well as the restore of
live data. The main difference being that if your policy
is to rely on a full image recovery the history and the knowledge of
each bit that was twiddled is obscured.
If you have many RHCE's at a company whos sole job it is to administrate a
Linux/Unix environment this may not seem like a big deal. Or if your
company is ok not knowing how a device came to its existing
configuration as long as a "disaster recoverable image is handy (and off
site)". However I dare say that the "real world" isn't full of companies
with hoards of RHCE's or people with enough skill to replicate some
random unix/linux servers from scratch without breaking a sweat.
In my environment we have 150 IS staff supporting around 600-700 servers
(not all unix/linux by any means) and around 10,000 end user nodes (health
care). Perhaps 5 of them have the skill to accomplish rebuilding one of
our Linux servers by hand within a few days and not all of these
are in the same "group".
In other words having a document and scripted procedure that defines what
each Linux server is and how it is built makes Linux much more accessible
and supportable by the "real world" IS personal where I work.
For a quick recovery they simply:
1) Plug replacement hardware into the network
2) Boot hardware
3) At the pxe boot screen type: "restore ks=http://dist_server/servername.ks"
4) 20 minutes later the server reboots and is fully restored.
If someone needs to figure out what all a server does they merely check
the servername.ks and support scripts. If new staff comes on board who
wants to get up to speed on what all is involved with our Linux farm
he/she can go read the kickstart scripts in a day. More
importantly those who do know Linux quite well are freed up to work on
new engineering tasks while management notices that the "non linux
experts" can manage recovery and simple changes (to the scripts) just
fine... Linux isn't that hard after all.
If you want an example of an OS that got this perfect take a look at Cisco
IOS. Granted the configuration and applications of their os's are simple
compared to Linux, but you can't beat uploading a .cfg file and calling it
good.
--
"Given enough time, all legal battles in the tech industry will invoke the
DMCA. This generally means that all constructive arguments have ended."
-NialScorva
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]