[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: pthread_kill is racy: probably needs kernel change



On Mon, 4 Nov 2002, Luca Barbieri wrote:

> pthread_kill takes the tid from the struct pthread and passes it to the
> kernel in sys_tkill.

this is a fundamental property of signals and the PID space - there's no
way for userspace to 'open' a PID/TID and to use it, while requiring the
kernel to not exit the referenced thread in any way.

so userspace has to be sure that:

  1) the PID/TID it is passing to sys_kill()/sys_tkill() is valid
  2) the target process/thread does not exit prematurely

> However, between the time userspace reads the tid and the time the
> kernel finds the task_struct, the thread might have exited and the tid
> reused, resulting in killing the wrong process.

i suspect there are other problems as well that might happen if a struct
pthread is freed while it's being used - isnt this a programming bug?

> The fact that the tid can be immediately reused is a consequence of
> using CLONE_DETACHED, that however should IMHO not be removed since it
> avoids a syscall to free the zombie (and also avoids SIGCHLD).

it's not a consequence of CLONE_DETACHED - it's a fundamental property of
the PID/TID space. It's something that is freed.

or is there some other race i'm missing?

> The fix that I propose is to change sys_tkill so that a pointer to the
> tid is passed. The kernel can then get the value and find the task while
> holding tasklist_lock, thus protecting from task_release resulting from
> an eventual thread exit.

there's one thing that can be done, which i proposed a few weeks ago, to
pass in PID and TID as well to tkill(), thus the kernel can double-check
that the thread is really in the intended thread group. The kernel
guarantees that the 'thread group ID' (ie. the PID) is not reused unless
all threads exit.

> A related problem is that if the tid is 0, pthread_kill returns EINVAL,
> while according to SUSv3 it should return 0.

this is a bug.

> BTW, can the sys_tkill ABI be broken or would a new syscall be needed?

the polite thing is to create a new syscall, sys_tkill() is something that
exists in the 2.4 kernel as well and NGPT uses sys_tkill().

	Ingo





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]