latest rawhide kernel 2.6.9-1.640 and suspend to disk

Per Bjornsson perbj at stanford.edu
Tue Oct 26 20:56:37 UTC 2004


On Tue, 2004-10-26 at 11:10, Dave Jones wrote:

> My comments on swsusp are largely unprintable, but in short, this
> kind of risk (where end-users can lose data) isn't justifiable.
> Upstream, swsusp is still very much in flux, with a large out-of-tree
> patchset to 'improve' upon it.

Is swsusp2 (I presume this is the "out-of-tree patchset" you mentioned)
as ugly from a code point of view? I thought it might be a bit better.

By the way, what is the data loss scenario - pretty much just that
whatever wasn't saved is is lost if the computer doesn't wake up
(basically the same as hitting the hard reset button at a random
moment), or random wreckage all over the hard disk? That makes a huge
difference!

> I'm not optimistic we'll see this working reliably any time soon.
> Read some of the comments from the upstream maintainer on linux-kernel.
> There are numerous cases of swsusp failing where the end-user has
> been given the brush-off "oh well, your hardware is crap".

Yes, looking at LKML it sure looks like some of the people involved
sadly don't really care about how safe it is, as long as it kind of
works with some luck.

My understanding was that Pat Mochel seemed to have a bit more of a
grasp of the fundamentals involved, so I had some hope that the re-merge
of swsusp and pmdisk would improve things a bit. That's not the case?

> Comments about swsusp code quality aside, there also exists a larger
> problem -- lots of devices still don't handle wakeup from deepsleep state.
> The code just isn't written, and a lot of maintainers of various device
> drivers don't particularly care enough about suspend to make it work.
> Unless someone who does care submits patches, those drivers remain
> in the state they're in today.

Interesting; I thought that all the devices were actually turned off and
the computer was completely booted from scratch with the current linux
implementation of suspend-to-disk in any case. In that case, shouldn't
all the devices just get initialized just as they would be in a regular
boot sequence? On my dual-boot notebook I can run Windows, "hibernate"
which is the Windows version of suspending to disk, turn it back on and
boot into Linux, reboot and get back into the Windows session where I
left off. That certainly indicates to me that at least the Windows
suspend-to-disk state is in fact turned off, or at least the functional
equivalent. I can't see how any state could be preserved across running
another OS on the computer!

If this is the case, isn't it actually S3 (suspend to RAM) that would be
the more difficult state to get devices back from? Coming back from that
state it seems like lots of devices are not reinitialized by the BIOS,
while they should be on boot. Or rather, Linux must know what to do with
these devices on boot since they normally work!

> Still think this is something worth taking a risk for ?

Well, I've had a system hosed (massive filesystem corruption, lots of
critical binaries had just disappeared so the system wouldn't even boot
any longer - I needed it back so I just reinstalled, didn't have time to
investigate further unfortunately) when I tried to get suspend-to-RAM
working a few months ago, so it seems that the Red Hat kernels already
do some things sleeping-wise that can be construed to be risky. Is
non-functional suspend-to-disk more likely to corrupt data than
non-functional suspend-to-RAM (which is to a great extent what we have
now)? I guess the judgment call is yours to make, but I wouldn't be
totally averse to seeing this turned on if the risk level is similar to
that involved in S3 suspend.

If S3 suspend actually worked and was less risky, I wouldn't care
particularly about S4/suspend to disk; however, I've certainly seen the
sentiment that S3 is actually harder to do right (in terms of device
initialization etc) than S4 so reliable S3 is unlikely to ever happen on
a lot of hardware... Do you believe that reliable S3 suspend has a
greater likelyhood of happening than reliable suspend-to-disk?

(My problem seems similar to
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=122604 although I
have a Fujitsu S2020, I'll try to investigate more when I have time to
deal with potential file system corruption again. Of course, that report
doesn't imply file system corruption; it might have been that I forgot
to fsck between failed attempts though - but I am using ext3 so I
thought journaling should help me a bit. I only saw it once though, I
got too scared to continue after that.)

/Per

-- 
Per Bjornsson <perbj at stanford.edu>
Ph.D. Candidate, Department of Applied Physics, Stanford University




More information about the fedora-devel-list mailing list