[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: UP2000 crashes -- update



> I found out that the guys using running that big number
> crunching program on the UP2000, which regularly caused the
> machine to commit suicide with "Too many files open", always set
> the stack size to unlimited. Now I tried setting the limit to
> 16M et voilá: the sucker's running stable for days now...

When it crashes, do you get these exceptions on the console and in
/var/log/messages:

RUC10_10.c0006.: Exception at [<fffffc0000317068>] (fffffc000031706c)
RUC10_10.c0006.: Exception at [<fffffc0000317068>] (fffffc000031706c)
RUC10_10.c0006.: Exception at [<fffffc0000317088>] (fffffc000031708c)

If you look in /var/log/messages it gives a better clue:

Dec 24 21:56:08 c21 kernel: RUC10_10.c0006.: Exception at
[setup_sigcontext+40/736] \
(fffffc0000316e2c)
Dec 24 21:56:08 c21 kernel: RUC10_10.c0006.: Exception at
[setup_sigcontext+48/736] \
(fffffc0000316e34)
Dec 24 21:56:08 c21 kernel: RUC10_10.c0006.: Exception at
[setup_sigcontext+64/736] \
(fffffc0000316e44)

Basically, setup_sigcontext is getting a pointer with illegal alignment.

I would guess that this frame sits on the stack?

Apparently the resulting memory clobber results in trashing the # of open
files. The files aren't actually opened; you can cd /proc ; ls */fds | wc -l
and see that they don't realy exist...

-- greg



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index] []