[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: forced fsck (again?)



At the risk of adding complexity, what about having the SNAPSIZE be
automatically determined?  Most users would have no idea what to set
it to, and we should be able to guess some reasonable values.  For
example, the fsck time can probably be estimated by looking at the
number of inodes, how full the filesystem is, etc.  Alternatively, we
could just allocate all available space in the LVM.

I also have a newbie question: does the fsck of a snapshot really
catch everything that might be wrong with the drive, or are there
other failure modes that only a real fsck would catch?  I'm wondering
if it's still a good idea to do an occasional full fsck.

Damian

2008/1/23 Bryan Kadzban <bryan kadzban is-a-geek net>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> Andreas Dilger wrote:
> > On Jan 23, 2008  09:08 -0500, Theodore Tso wrote:
> >> (We could sneek some of that information into the options field of
> >> fstab, since the kernel and other programs that parse that field
> >> just take what they need and ignore the rest, but.... ick, ick,
> >> ick.  :-)
> >
> > I agree - adding email to fstab is icky and I wouldn't go there.  I
> > don't see a problem with just emailing it to "root@" by default and
> > giving the user the option to change it to something else.
>
> Since the email address is not per-filesystem, it's fine by me to put it
> into a config file somewhere.  Forcing the interval to be global is
> probably also OK, although I wouldn't want to be forced to set the
> snapshot size globally.  I do think that fstab is the best place for
> per-filesystem options, though.
>
> But it's not too difficult to parse out a custom SNAPSIZE option, and
> even have a DEFAULT_SNAPSIZE in the config file if no SNAPSIZE option is
> present on any LV, if the script is going to parse fstab anyway.  (Or
> should the option's name be lowercase?  Either will work.)
>
> >> Also, I could imagine that a user might not want to check all of
> >> the filesystems in fstab.
> >
> > Similarly, a config file which disables checking on some LV if
> > specified seems reasonable.
>
> That does seem reasonable, but I haven't done it in the script that's
> attached.  Maybe support for a SKIP (or skip, or e2check_skip, or
> skip_e2check, or whatever) option in fstab's options field?
>
> Regarding the idea of having this support multiple filesystems -- that's
> a good idea, I think, but the current script is highly specific to ext2
> or ext3.  Use of tune2fs (to reset the last-check time) and dumpe2fs (to
> find the last-check time), in particular, will be problematic on other
> FSes.  I haven't done that in this script, though it may be possible.
>
> Anyway, here's a second version.  I've changed it to parse up fstab,
> and added an option for what to do if AC status can't be determined.
> Kernel-style changelog entry, etc., below:
>
> - -------
>
> Create a script to transparently run e2fsck in the background on any LVM
> logical volumes listed in /etc/fstab, as long as the machine is on AC
> power, and that LV has been last checked more than a configurable number
> of days ago.  Also create a configuration file to set various options in
> the script.
>
> Signed-Off-By: Bryan Kadzban <bryan kadzban is-a-geek net>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFHl/OXS5vET1Wea5wRA/UaAJwIE27W6qasI7Gm/uvZm/pY1rcBtwCcDXYq
> cc3qE/uOEqm4ksYHlI6+IJU=
> =7Lf3
> -----END PGP SIGNATURE-----
>
> #!/bin/sh
> #
> # e2check
>
> # Released under the GNU General Public License, either version 2 or
> #  (at your option) any later version.
>
> # Overview:
> #
> #  Run this from cron each night.  If the machine is on AC power, it
> #  will run the checks; otherwise they will all be skipped.  (If the
> #  script can't tell whether the machine is on AC power, a setting in
> #  the configuration file (/etc/e2check.conf) decides whether it will
> #  continue or abort.)
> #
> #  The script will then decide which filesystems in /etc/fstab are on
> #  logical volumes, and can therefore be checked via an LVM snapshot.
> #  Each of these filesystems will be queried to find its last check
> #  day, and if that was more than $INTERVAL days ago (where INTERVAL
> #  is set in the configuration file as well), then the script will
> #  take an LVM snapshot of the filesystem and run e2fsck on the
> #  snapshot.  The snapshot's size can be set via either the SNAPSIZE
> #  option in the options field in /etc/fstab, or the DEFAULT_SNAPSIZE
> #  option in /etc/e2check.conf -- but make sure it's set large enough.
> #  After e2fsck finishes, the snapshot is destroyed.
> #
> #  Any filesystem that passes e2fsck will have its last-check time
> #  updated (in the real superblock, not the snapshot); any filesystem
> #  that fails will send an email notification to a configurable user
> #  ($EMAIL).  This $EMAIL setting is optional, but its use is highly
> #  recommended, since if any filesystem fails, it will need to be
> #  checked manually offline.
>
> function on_ac_power() {
>         local any_known=no
>
>         # try sysfs power class first
>         if [ -d /sys/class/power_supply ] ; then
>                 for psu in /sys/class/power_supply/* ; do
>                         if [ -r "${psu}/type" ] ; then
>                                 type="`cat "${psu}/type"`"
>
>                                 # ignore batteries
>                                 [ "${type}" = "Battery" ] && continue
>
>                                 online="`cat "${psu}/online"`"
>
>                                 [ "${online}" = 1 ] && return 0
>                                 [ "${online}" = 0 ] && any_known=yes
>                         fi
>                 done
>
>                 [ "${any_known}" = "yes" ] && return 1
>         fi
>
>         # else fall back to AC adapters in /proc
>         if [ -d /proc/acpi/ac_adapter ] ; then
>                 for ac in /proc/acpi/ac_adapter/* ; do
>                         if [ -r "${ac}/state" ] ; then
>                                 grep -q on-line "${ac}/state" && return 0
>                                 grep -q off-line "${ac}/state" && any_known=yes
>                         elif [ -r "${ac}/status" ] ; then
>                                 grep -q on-line "${ac}/status" && return 0
>                                 grep -q off-line "${ac}/status" && any_known=yes
>                         fi
>                 done
>
>                 [ "${any_known}" = "yes" ] && return 1
>         fi
>
>         if [ "$AC_UNKNOWN" == "CONTINUE" ] ; then
>                 return 0   # assume on AC power
>         elif [ "$AC_UNKNOWN" == "ABORT" ] ; then
>                 return 1   # assume on battery
>         else
>                 echo "Invalid value for AC_UNKNOWN in the config file" >&2
>                 exit 1
>         fi
> }
>
> function check_fs() {
>         local vg="$1"
>         local lv="$2"
>         local opts="$3"
>         local snapsize="${DEFAULT_SNAPSIZE}"
>
>         case "$opts" in
>                 *SNAPSIZE=*)
>                 # parse out just the SNAPSIZE option's value
>                 snapsize="${opts##*SNAPSIZE=}"
>                 snapsize="${snapsize%%,*}"
>                 ;;
>         esac   # else leave it at DEFAULT_SNAPSIZE
>
>         [ -z "$snapsize" ] && return 1
>
>         local tmpfile=`mktemp -t e2fsck.log.XXXXXXXXXX`
>         trap "rm $tmpfile ; trap - RETURN" RETURN
>
>         local start="$(date +'%Y%m%d%H%M%S')"
>
>         lvcreate -s -L "${snapsize}" -n "${lv}-snap" "${vg}/${lv}"
>
>         if nice logsave -as $tmpfile e2fsck -p -C 0 "/dev/${vg}/${lv}-snap" && \
>                         nice logsave -as $tmpfile e2fsck -fy -C 0 "/dev/${vg}/${lv}-snap" ; then
>                 echo 'Background scrubbing succeeded!'
>                 tune2fs -C 0 -T "${start}" "/dev/${vg}/${lv}"
>         else
>                 echo 'Background scrubbing failed! Reboot to fsck soon!'
>                 tune2fs -C 16000 -T "19000101" "/dev/${vg}/${lv}"
>
>                 if test -n "$EMAIL"; then
>                         mail -s "E2fsck of /dev/${vg}/${lv} failed!" $EMAIL < $tmpfile
>                 fi
>         fi
>
>         lvremove -f "${vg}/${lv}-snap"
> }
>
> set -e
>
> # pull in configuration -- don't bother with a parser, just use the shell's
> . /etc/e2check.conf
>
> # check whether the machine is on AC power: if not, skip the e2fsck
> on_ac_power || exit 0
>
> # parse up fstab
> grep -v '^#' /etc/fstab | grep -v '^$' | awk '$6!=0 {print $1,$3,$4;}' | \
> while read FS FSTYPE OPTIONS ; do
>         # Use of tune2fs in check_fs, and dumpe2fs below, means we can
>         #  only handle ext2/ext3 FSes
>         [ "$FSTYPE" != "ext3" || "$FSTYPE" != "ext2" ] && continue
>
>         # get the volume group (or an error message)
>         VG="`lvs --noheadings -o vg_name "$FS" 2>&1`"
>
>         # skip non-LVM devices (hopefully LVM VGs don't have spaces)
>         [ "`echo "$VG" | awk '{print NF;}'`" -ne 1 ] && continue
>
>         # get the logical volume name
>         LV="`lvs --noheadings -o lv_name "$FS"`"
>
>         # get the last check time plus $INTERVAL days
>         check_date=`dumpe2fs -h "/dev/${VG}/${LV}" 2>/dev/null | grep 'Last checked:' | \
>                 sed -e 's/Last checked:[[:space:]]*//'`
>         check_day=`date --date="${check_date} $INTERVAL days" +"%Y%m%d"`
>
>         # get today's date, and skip LVs that don't need to be checked yet
>         today=`date +"%Y%m%d"`
>         [ "$check_day" -gt "$today" ] && continue
>
>         # else, check it
>         check_fs "$VG" "$LV" "$OPTIONS"
> done
>
>
> #!/bin/sh
>
> # e2check configuration variables:
> #
> #  EMAIL
> #   Address to send failure notifications to.  If empty,
> #   failure notifications will not be sent.
> #
> #  INTERVAL
> #   Days to wait between checks.  All LVs use the same
> #   INTERVAL, but the "days since last check" value can
> #   be different per LV, since that value is stored in
> #   the ext2/ext3 superblock.
> #
> #  DEFAULT_SNAPSIZE
> #   Default snapshot size to use if none is specified
> #   in the options field in /etc/fstab (using the custom
> #   SNAPSIZE=xxx option) for any LV.  Valid values are
> #   anything that the -L option to lvcreate will accept.
> #
> #  AC_UNKNOWN
> #   Whether to run the e2fsck checks if the script can't
> #   determine whether the machine is on AC power.  Laptop
> #   users will want to set this to ABORT, while server and
> #   desktop users will probably want to set this to
> #   CONTINUE.  Those are the only two valid values.
>
> EMAIL='root'
> INTERVAL=30
> DEFAULT_SNAPSIZE=100m
> AC_UNKNOWN="ABORT"
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users redhat com
> https://www.redhat.com/mailman/listinfo/ext3-users
>



-- 
http://www.uiuc.edu/~menscher/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]