[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: Is there room for improvement in rescue mode? (was Re: Goodbye, Fedora)
- From: "Keith G. Robertson-Turner" <fedora-gmane 00003 genesis-x nildram co uk>
- To: fedora-devel-list redhat com
- Subject: Re: Is there room for improvement in rescue mode? (was Re: Goodbye, Fedora)
- Date: Thu, 22 Feb 2007 21:34:24 +0000
Verily I say unto thee, that Jeff Spaleta spake thusly:
> In an effort to chart a new course of constructive discussion... is
> it worth brainstorming a bit about how to make rescue mode better or
> more accessible?
The current rescue mode is certainly sufficient for experienced
admins, however it would be a good idea to implement some helper
scripts, and possibly even a fluxbox minimal environment. The latter
would be especially useful to facilitate administering LVM via
system-config-lvm, as I must admit the lvm command syntax is still a
mystery to me.
The logical procedure should be, identify (as far as possible) what
*can* go wrong, think about how *you* would fix it, see if there's any
way to (semi)automate that process with helper scripts, and compare
that with what's currently available in the rescue environment.
Off the top of my head, I'd suggest:
1) Enable installing an immutable rescue partition, and add as a grub
2) Add a minimal graphical environment.
3) Add a "Rescue Install" to Anaconda.
4) Add the various system-config-* helpers.
5) Have a dedicated RPM rescue tool, since this is a special
case. I.e. is rpm + all deps correctly installed, are there stale
locks, sanity check on the database, etc.
6) Anaconda suggests a backup partition, or asks for a network backup
location, and sets up a cron job (SafeKeep?). I.e. push hard to
make backup mandatory(ish). I'd also suggest Disk Druid, etc.,
pushes the suggestion of LVM *and* a snapshot partition, which is
You could do some checks to see if the default root system is
bootable, etc., then automatically fall back to rescue mode if not
(GRUB patch?), rather than allow the init to proceed then fail. This
is essential on a headless server, where it's "stuck" and you can't
ssh in to see why.
If the idea of a GUI doesn't appeal to you (and for network admins it
probably doesn't), I'd suggest the implementation of a ncurses
interface for some of the helper tools (long term).
As a side note, though not directly related to "rescue", I advocate
that yum should be patched to enable partial-failure, i.e. "update as
much as possible, root notify failures". I understand it is not a
popular theory, but broken deps/repos break automatic updates
completely, rather than partially, which could be a problem, e.g. on a
large network (like mine) where an essential security update (and all
other updates) are not deployed, simply because of *one* broken, and
non-essential, package. This just doesn't make any logical sense, and
could be an issue for those relying on automated mass system updates.
Anyway, back on topic, let's say *I* ask *you*, my sysadmin, to fix
the following. What would you (i.e. the script) need to do to
(semi)automate this? Not all of these *have* solutions, that can be
implemented in software, but even the *hardware* issues could be given
more verbose notification/suggestions:
1) swapon ... won't activate, because the swap drive is dead, but
this is a low memory system set to automatically boot into X.
2) root filesystem mount failure.
3) Missing/corrupt initrd/bzimage.
4) Missing/non-funtional SCSI/IDE drivers in an *updated* kernel, so
cannot mount root filesystem (but previous kernel works).
5) service <foobar> segfaults and halts init.
6) service <foobar> has (missing files | other problem) and waits
forever (does not detach to daemon).
(hint for 5 and 6 - watchdog timer)
7) Initscripts are b0rked, typo, non-fatal error, etc. (I recently
caught one, still unresolved, nfs mountd problem). Why is this
needed for rescue mode? Because not all startup errors are noticed
by the (unobservant | people who blink a lot). :) A way of running
through ($chroot)/init.d in rescue mode looking for non-zero return
codes, and suggesting updates/workarounds etc., would be handy. But
maybe this is stretching "rescue mode" a little too far.
8) RPM is b0rked. How do I reinstall RPM ... without RPM??? Cyclic
dependency error 101: Arrrrggghh!
9) Again, maybe stretching "rescue" too far, but how about fslint in
rescue mode, to clean up all those "#PRELINK", "foobar~", and other
junk. Especially on a monolithic install (all under /) where /tmp
10) Only other thing I can think of is, SMART disk health checks,
however, according to Google's recent report (they did a massive
test), SMART is next to useless at actually predicting failure.
I'm sure 99% of the above is useless, but hey ... that's why they call
it brainstorming :)
http://slated.org - Slated, Rated & Blogged
| "Future archaeologists will be able to identify a 'Vista Upgrade
| Layer' when they go through our landfill sites" - Sian Berry, the
| Green Party.
Fedora Core release 5 (Bordeaux) on sky, running kernel 2.6.19-1.2288.fc5
21:32:25 up 3 days, 8:57, 2 users, load average: 0.26, 0.31, 0.27
[Date Prev][Date Next] [Thread Prev][Thread Next]