[Linux-cluster] Options other than reboot to stop DP processes thatcan't be killed -9

Colin Simpson Colin.Simpson at iongeo.com
Mon Aug 15 09:16:35 UTC 2011


Probably not a cluster issue just pure kernel question.  Sounds like the
driver or device is locked up and the driver or device is confused, so
the processes attached to it will be hung. 

To be honest I've had similar problems on pretty much all Unixes for
many years. And I've never found a good way out of it. Maybe not an
option with your case and application, but I guess why most people have
their backup systems running on separate dedicated boxes so it can be
rebooted without affecting production systems.

I wish there was a way of saying to the kernel, something like, I want
to forceably unload this driver for a device and you can kill any
processes attached to it. Then you could reinitialise the driver and
processes.

Resetting the physical device might work (or has for me in the past) but
it equally I'd guess could panic the kernel. 

If someone else has a better way out of a hung device driver on Linux
I'd love to know too (seems particularly bad for tape devices in my
experience when it happens).

Colin

On Mon, 2011-08-15 at 03:55 +0100, sunhux G wrote:
> Apologies if this is not the right list to post but getting desperate:
> 
> I have 2 processes (shown by ps -ef  below) which has 'jammed' the
> tape
> drive below & I can't "kill -9" them.
> 
> Is there any way short of reboot to stop them, say "service xxx
> restart" or
> anything else other than rebooting this Linux 4.x server?  Since
> reboot
> involves doing "service stop xxx" of various services, surely one of
> the
> xxx must be able to stop the processes (just an educated guess).  We
> faced this issue with our Dataprotector quite often so frequent reboot
> is not an option.
> 
> # ps -ef |grep -i bma |grep -v grep
> root     10197     1  0 Aug13 ?        00:00:08 /opt/omni/lbin/vbda
> -bmaname HP:Ultrium 4-SCSI_4 -type 2 -start 1313175661 -level 0
> -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
> 1313175612 -volume / -profile -no_lock -hlink -no_touch -no_encode
> -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
> -report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
> xxxdgjt1.ss.de:/ // / -no_aligned
> root     23303     1  0 Aug13 ?        00:00:03 /opt/omni/lbin/vbda
> -bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313192083 -level 0
> -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
> 1313192026 -volume / -profile -no_lock -hlink -no_touch -no_encode
> -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
> -report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
> xxxdgjt1.ss.de:/ // / -no_aligned
> root     25618     1  0 Aug13 ?        00:00:03 /opt/omni/lbin/vbda
> -bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313195066 -level 0
> -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
> 1313195016 -volume / -profile -no_lock -hlink -no_touch -no_encode
> -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
> -report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
> xxxdgjt1.ss.de:/ // / -no_aligned
> 
> 
> they're listening on the Tcp ports :
> 
> [root at xxxdgjt1 ~]# netstat -antp | grep 25618
> tcp       21      0 172.17.1.47:5555            172.17.12.12:2128
>      CLOSE_WAIT  25618/vbda
> [root at xxxdgjt1 ~]# netstat -antp | grep 23303
> tcp       21      0 172.17.1.47:5555            172.17.12.12:2073
>      CLOSE_WAIT  23303/vbda
> 
> 
> fuser all other partitions do not show processes locking/opening
> files, only the
> root (ie / ) partition :
> 
> # fuser / |grep 25618    ==> will show 25618 & 25618r as amongst the
> processes
> # fuser / |grep 23303    ==> will show 23303 & 23303r as amongst the
> processes
> 
> 
> # cd /etc
> # ls */*omni*
> xinetd.d/omni
> 
> opt/omni:
> client  server
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.






More information about the Linux-cluster mailing list