Ending strace on a process causes hung network connection

Yong Huang yong321 at yahoo.com
Wed Oct 21 17:46:36 EDT 2009


> This may be Oracle specific. I have
> Oracle 11gR2 installed on RHEL 5.3 (kernel
> 2.6.18-92.1.22.el5, x86 64-bit). From client, I can run
> tnsping to do the basic Oracle SQL*Net connection test, and
> can use sqlplus to connect into Oracle through SQL*Net, even
> while I'm running strace -p <TNS listener> on the
> server. But if I press Control-C to end strace on the
> server, tnsping or sqlplus hangs. It's reproducible.
> 
> When the listener does not hang, `telnet <listener
> IP> 1521' (1521 is the listener port number) allows you
> to type a few (4?) keystrokes and you're back to your own
> prompt (unless you happen to send a message according to the
> undocumented SQL*Net protocol). But when it hangs, telnet to
> it and you can type anything and it never closes the
> connection.
> 
> Question: What could cause strace to have this disruptive
> effect on a running process?
> 
> Yong Huang

The problem I reported earlier is not specific to Oracle. It's an OS or strace problem. strace on this box will stop the process. Here's a test:

$ strace -V
strace -- version 4.5.18
$ uname -a
Linux <my hostname> 2.6.18-92.1.22.el5 #1 SMP Fri Dec 5 09:28:22 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
$ sleep 100 &
[1] 20379
$ grep ^State /proc/20379/status
State:  S (sleeping)
$ strace -p 20379
Process 20379 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...> <unfinished ...>
Process 20379 detached

[1]+  Stopped                 sleep 100
$ grep ^State /proc/20379/status
State:  T (stopped)
$ kill -CONT 20379
$ grep ^State /proc/20379/status
State:  S (sleeping)

You can see that after I press Control-C to quit strace, it reports the sleep process is stopped, verified with /proc/<pid>/status. It can be un-stopped by kill -CONT.

I find another box running the same Red Hat, same strace, slightly different kernel version (2.6.18-128.1.10.el5). The behavior is slightly different: Control-C can't quit strace. I have to kill strace. But once strace is killed, the sleep remains in sleeping state. (Of course, the problem box also won't have the problem if strace is killed instead of quit by Control-C.)

Can anybody test on your end? Any pattern we can find?

Yong Huang



      




More information about the redhat-list mailing list