[Crash-utility] Using FAULT_ON_ERROR in readmem calls

Tue Aug 28 14:28:16 UTC 2012

----- Original Message -----
> 
> 
> 
> 
> Hi Dave
> 
> 
> 
> I would like to discuss the usage of FAULT_ON_ERROR in readmem calls.
> I have now seen a number of situations where this prevents Crash to
> produce appropriate results when some memory is corrupt.
> 
> The last problem I saw a few days ago was in kernel.c, in function
> dumplog
> 
> readmem(log_buf, KVADDR, buf, log_buf_len, "log_buf contents", FAULT_ON_ERROR)
> 
> The problem was that log_buf_len contained a very large value (memory
> overwrite?) so the readmem failed due to the size. This means of
> course that it was not possible to print the log, but as this
> function is called during Crash startup it also had the consequence
> that Crash terminated during startup. By just changing
> FAULT_ON_ERROR to RETURN_ON_ERROR and perform a return if the
> readmem failed I could use Crash to investigate this vmcore file,
> except for printing the log.

Right -- in fact for the new Linux 3.5 variable length record log buffer
format, I do use RETURN_ON_ERROR.  But the older format that you reference
can be changed to RETURN_ON_ERROR as well.

> A second place where I have made some patches in Crash is in function
> arm_uvtop (arm.c). In the readmem calls in this function I have
> changed FAULT_ON_ERROR to RETURN_ON_ERROR and just made a "return
> FALSE;" if the readmem fails. Unfortunately I do not remember the
> details why I made this change, but I think there were a case where
> Crash terminated during startup and with these changes it was
> possible to investigate the vmcore file.

Right, from time to time when these situations come up, they get handled
on a case-by-case basis *if* it's possible to safely continue.

> 
> Another situation I have seen is in help functions like
> fill_vma_cache and fill_file_cache. When I use these functions in
> extensions the commands will fail and terminate immediately if a
> readmem call fails. In several cases I could easily handle such a
> failure and the command could still produce a lot of relevant
> results.

Right -- but if a legitimate vm_area_struct or file struct address is 
unreadable, then something is clearly wrong with the dumpfile.

If your extension module has the capability of passing a bogus
vm_area_struct or file structure address, then perhaps you should
call "accessible(vaddr)" first?  Or perhaps you're calling some other
function that in turn calls one of them?  If that's true, then you
should definitely use accessible() first...

> 
> 
> In the plugins I write I use RETURN_ON_ERROR in principle everywhere
> and of course then handle the error situations myself. I have done
> this to avoid situations as the ones described above.

As you should... 

> 
> I am not asking you to remove most usage of FAULT_ON_ERROR, as I
> realize the size and risks with such changes. However I would like
> to bring up this question and hear your views. When working with
> vmcore files with minor memory corruptions, using FAULT_ON_ERROR
> will limit the usability of Crash.

As I mentioned above, the FAULT_ON_ERROR cases are meant to protect
you from continuing down a path which is doomed.  But certainly
in cases where the session can be continued with confidence, 
especially during initialization where FAULT_ON_ERRORS kill
the session, then those cases should be addressed.

On the other hand, making wholesale changes to handle "minor memory
corruptions" is dangerous.  In fact, what is a "minor memory corruption"?
If the crash utility gets tripped up because the kernel has corrupted
its own memory, then you could argue that crash is doing its job.

But again, I would certainly consider any changes to RETURN_ON_ERROR
on a case-by-case basis.  The cmd_log() example is a good one -- I'll
fix that for crash-6.1.0.

Thanks,
  Dave