Netlink Socket Problem

Steve Grubb sgrubb at redhat.com
Mon Feb 28 22:48:28 UTC 2005


Hi,

I'm still working on some bugs that I found over the weekend for libaudit. I 
modified pam and passwd to log events to the audit netlink connection. As a 
result, I ran into a problem. The problem is probably best illustrated by 
showing in auditctl.c how to reproduce it.

If you open auditctl.c, look for reset_vars(). In that function is 
audit_open(). Add a second call to audit_open so that it looks like this:

static int reset_vars(void)
{
        list_requested = 0;
        syscalladded = 0;
        add = 0;
        del = 0;
        action = 0;
        memset(&rule, 0, sizeof(rule));
audit_open();  // this is added. we don't care what the return is.
        if ((fd = audit_open()) < 0) {
                fprintf(stderr, "Cannot open netlink audit socket\n");
                return 1;
        }
        return 0;
}

What this does is makes the application open 2 netlink connections to the 
audit system. Compile it and try ./auditctl -s  Using strace this is what I 
get (with my annotations):

rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, 
{rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
_sysctl({{CTL_KERN, KERN_VERSION}, 2, 0xbfec898c, 31, (nil), 0}) = 0
socket(PF_NETLINK, SOCK_RAW, 9)         = 3  
<- first open ->

bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
socket(PF_NETLINK, SOCK_RAW, 9)         = 4
 <- second open ->

bind(4, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
sendto(4, "\20\0\0\0\350\3\1\0gE\213k\0\0\0\0", 16, 0, {sa_family=AF_NETLINK, 
pid=0, groups=00000000}, 12) = 16
<- send request , now get answer ->

recvfrom(4, 0xbfec7ed0, 1216, 64, 0xbfec7e60, 0xbfec7e5c) = -1 EAGAIN 
(Resource temporarily unavailable) write(2, "Error receiving netlink packet 
("..., 65Error receiving netlink packet (Resource temporarily unavailable)) = 
65 write(2, "\n", 1
)                       = 1
<- error? ->

nanosleep({0, 100000000}, NULL)         = 0
recvfrom(4, 0xbfec7ed0, 1216, 64, 0xbfec7e60, 0xbfec7e5c) = -1 EAGAIN 
(Resource temporarily unavailable)
write(2, "Error receiving netlink packet ("..., 65Error receiving netlink 
packet (Resource temporarily unavailable)) = 65
write(2, "\n", 1
<- error? ->

As you can see it scrolls messages because you get EAGAIN returned. This is a 
real problem right now and I'm not sure how best to solve it short of making 
a request, closing the descriptor, and re-open it for each communication to 
the kernel.

What happens in real life is that passwd is going to log some data to the 
audit system and opens a socket, then it collects the passwords, if 
everything is OK, it passes the passwords to pam for authentication token 
update. Pam decides that it needs to do some logging of its own and opens 
descriptors to the audit system. They fail like above, EAGAIN.

Does any of you kernel hackers know why apps are limited to 1 netlink socket 
connection? Can someone else verify the problem? 

I think I can fix the problem by constantly closing and opening connections, 
but that is ugly and not efficient. This "bug/feature" is holding up the 
release of the next version of audit and patched trusted programs.

Thanks,
-Steve Grubb




More information about the Linux-audit mailing list