如果连接到strace,挂起的进程会恢复 [英] Hung processes resume if attached to strace

查看:210
本文介绍了如果连接到strace,挂起的进程会恢复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用TCP套接字用C编写的网络程序。有时客户端程序会永远挂起,期待从服务器输入。具体而言,客户端挂在fd上的select()调用集上,用于读取服务器发送的字符。



我使用strace来知道进程卡在哪里。但是,有时当我挂上挂起的客户端进程时,它会立即恢复执行并正常退出。并非所有挂起的进程都表现出这种行为,即使我将它们附加到strace中,一些进程仍然卡在select()中。但是大多数进程在连接到strace时会恢复执行。



我很好奇当连接到strace时导致进程恢复的原因。它可能会让我知道为什么客户端进程被挂起。



任何想法?什么原因导致挂起的进程在连接到strace时恢复执行?



更新:

以下是输出strace on hung processes。

 > sudo strace -p 25645 
进程25645附加 - 中断退出
--- SIGSTOP(停止(信号))@ 0(0)---
--- SIGSTOP(停止(信号) ))@ 0(0)---
[过程PID = 25645以32位模式运行。 ]
select(5,\ 0,8192)= 1 $ b $(
select(6,[3 5],NULL,NULL,NULL)= 2(in [3 5] b写(2,,0)= 0
读(3,====设置set_oldtempbehaio...,8192)= 555
写(1,====设置(6,[3 5],NULL,NULL,NULL)= 2(in [3 5])
read(5,,8192 )= 0
read(3,,8192)= 0
close(5)= 0
kill(25652,SIGKILL)= 0
exit_group(0)=?
进程25645分离



_

 > sudo strace -p 14462 
过程14462附加 - 中断退出
[过程PID = 14462以32位模式运行。 ]
read(0,0xff85fdbc,8192)= -1 EIO(输入/输出错误)
shutdown(3,1 / * send * /)= 0
exit_group(0)=?

_

 > sudo strace -p 7517 
过程7517附加 - 中断退出
--- SIGSTOP(停止(信号))@ 0(0)---
--- SIGSTOP(停止))@ 0(0)---
[过程PID = 7517以32位模式运行。 ]
connect(3,{sa_family = AF_INET,sin_port = htons(300),sin_addr = inet_addr(100.64.220.98)},16)= -1 ETIMEDOUT(连接超时)
close 3)= 0
dup(2)= 3
fcntl64(3,F_GETFL)= 0x1(flags O_WRONLY)
close(3)= 0
write(2,dsd13 :连接超时\\\
,30)= 30
write(2,Error code:110\\\
,17)= 17
rt_sigprocmask(SIG_SETMASK,[],NULL,8) = 0
exit_group(1)=?
过程7517分离

不仅仅是select(),而且过程(同一个程序)在我将它们附加到strace之前,它们被困在各种系统调用中。他们在附加strace后突然恢复。如果我不附加他们strace,他们只是永远挂在那里。



更新2:



我了解到strace可能会启动一个先前已停止的进程(T进程中的进程)。现在我正试图理解为什么这些过程进入'T'状态,原因是什么。这里是/ proc //状态信息:

 > cat / proc / 12554 / status 
名称:某人
州:T(已停止)
SleepAVG:88%
Tgid:12554
pid:12554
PPid:9754
TracerPid:0
Uid:5000 5000 5000 5000
Gid:48986 48986 48986 48986
FDSize:256
组:9149 48986
VmPeak :1992 kB
VmSize:1964 kB
VmLck:0 kB
VmHWM:608 kB
VmRSS:608 kB
VmData:156 kB
VmStk:20 kB
VmExe:16 kB
VmLib:1744 kB
VmPTE:20 kB
主题:1
SigQ:54/73728
SigPnd:0000000000000000
ShdPnd:0000000000000000
SigBlk:0000000000000000
SigIgn:0000000000000006
SigCgt:0000000000004000
CapInh:0000000000000000
CapPrm:0000000000000000
CapEff:0000000000000000
Cpus_allowed:00000000,00000000,00000000,0000000f
Mems_allowed:00000000,00000001


解决方案

strace 使用 ptrace ptrace手册页包含以下内容:

 由于附件发送SIGSTOP,并且跟踪器通常会禁止它,所以
可能会导致从当前正在执行的系统
中返回一个零散的EINTR在tracee中,如信号注入和
抑制部分所述。

您是否看到 select return EINTR


I have a network program written in C using TCP sockets. Sometimes the client program hangs forever expecting input from server. Specifically, the client hangs on select() call set on an fd intended to read characters sent by server.

I am using strace to know where the process got stuck. However, sometimes when I attach the hung client process to strace, it immediately resumes it's execution and properly exits. Not all hung processes exhibit this behavior, some processes stuck in the select() even if I attach them to strace. But most of the processes resume their execution when attached to strace.

I am curious what causing the processes resume when attached to strace. It might give me clues to know why client processes are getting hung.

Any ideas? what causes a hung process to resume it's execution when attached to strace?

Update:

Here's the output of strace on hung processes.

> sudo strace -p 25645
Process 25645 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[ Process PID=25645 runs in 32 bit mode. ]
select(6, [3 5], NULL, NULL, NULL)      = 2 (in [3 5])
read(5, "\0", 8192)                     = 1
write(2, "", 0)                         = 0
read(3, "====Setup set_oldtempbehaio"..., 8192) = 555
write(1, "====Setup set_oldtempbehaio"..., 555) = 555
select(6, [3 5], NULL, NULL, NULL)      = 2 (in [3 5])
read(5, "", 8192)                       = 0
read(3, "", 8192)                       = 0
close(5)                                = 0
kill(25652, SIGKILL)                    = 0
exit_group(0)                           = ?
Process 25645 detached

_

> sudo strace -p 14462
Process 14462 attached - interrupt to quit
[ Process PID=14462 runs in 32 bit mode. ]
read(0, 0xff85fdbc, 8192)               = -1 EIO (Input/output error)
shutdown(3, 1 /* send */)               = 0
exit_group(0)                           = ?

_

> sudo strace -p 7517
Process 7517 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[ Process PID=7517 runs in 32 bit mode. ]
connect(3, {sa_family=AF_INET, sin_port=htons(300), sin_addr=inet_addr("100.64.220.98")}, 16) = -1 ETIMEDOUT (Connection timed out)
close(3)                                = 0
dup(2)                                  = 3
fcntl64(3, F_GETFL)                     = 0x1 (flags O_WRONLY)
close(3)                                = 0
write(2, "dsd13: Connection timed out\n", 30) = 30
write(2, "Error code : 110\n", 17)      = 17
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(1)                           = ?
Process 7517 detached

Not just select(), but the processes(of same program) are stuck in various system calls before I attach them to strace. They suddenly resume after attaching to strace. If I don't attach them to strace, they just hang there forever.

Update 2:

I learned that strace could start a process which was previously stopped (process in T sate). Now I am trying to understand why did these processes go to 'T' state, what's the cause. Here's the /proc//status information:

> cat /proc/12554/status
Name:   someone
State:  T (stopped)
SleepAVG:       88%
Tgid:   12554
Pid:    12554
PPid:   9754
TracerPid:      0
Uid:    5000    5000    5000    5000
Gid:    48986   48986   48986   48986
FDSize: 256
Groups: 9149 48986
VmPeak:     1992 kB
VmSize:     1964 kB
VmLck:         0 kB
VmHWM:       608 kB
VmRSS:       608 kB
VmData:      156 kB
VmStk:        20 kB
VmExe:        16 kB
VmLib:      1744 kB
VmPTE:        20 kB
Threads:        1
SigQ:   54/73728
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000006
SigCgt: 0000000000004000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed:   00000000,00000000,00000000,0000000f
Mems_allowed:   00000000,00000001

解决方案

strace uses ptrace. The ptrace man page has this:

Since attaching sends SIGSTOP and the tracer usually suppresses it,
this may cause a stray EINTR return from the currently executing system
call in the tracee, as described in the "Signal injection and
suppression" section.

Are you seeing select return EINTR?

这篇关于如果连接到strace,挂起的进程会恢复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆