Linux:fork& execv,等待子进程挂起 [英] Linux: fork & execv, wait for child process hangs

查看:85
本文介绍了Linux:fork& execv,等待子进程挂起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个辅助函数,使用受此 answer 启发的fork()和execv()启动进程.它用于启动例如mysqldump进行数据库备份. 该代码在使用不同程序的几个不同位置中完全可以正常工作.

现在我击中了一个失败的星座: 这是对systemctl的调用,以停止设备.运行systemctl即可,设备已停止.但是在中间进程中,当等待子进程的wait()时,wait()会挂起,直到超时进程结束. 如果我检查工作进程是否以kill()完成,我可以告诉它完成了.

重要提示:程序不会出现异常或段错误,除了wait()并不表示工作进程结束之外! 我的代码(见下文)中是否有任何不正确的东西可以触发该行为? 我已阅读线程和fork():三思在混合它们之前,但是我在那里找不到与我的问题有关的任何东西.

奇怪的是: 在程序JSON-RPC中使用了很深的内容.如果我使用JSON-RPC停用代码,一切正常!?

环境: 使用该功能的程序是一个多线程应用程序.所有线程的信号均被阻止.主线程通过sigtimedwait()处理信号.

带有示例主要功能的代码(通过std :: cout将日志记录换成输出的生产代码)

#include <iostream>

#include <unistd.h>
#include <sys/wait.h>

namespace {

bool checkStatus(const int status) {
    return( WIFEXITED(status) && ( WEXITSTATUS(status) == 0 ) );
}

}

bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) {
    auto result = true;

    const pid_t intermediatePid = fork();
    if(intermediatePid == 0) {
        // intermediate process
        std::cout << "Intermediate process: Started (" <<  getpid() << ")." << std::endl;
        const pid_t workerPid = fork();
        if(workerPid == 0) {
            // worker process
            if(fileDescriptor) {
                std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl;
                const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO);
                if(-1 == dupResult) {
                    std::cout << "Worker process: Duplication of file descriptor failed." << std::endl;
                    _exit(EXIT_FAILURE);
                }
            }
            execv(path, const_cast<char**>(argv));

            std::cout << "Intermediate process: Worker failed!" << std::endl;
            _exit(EXIT_FAILURE);
        } else if(-1 == workerPid) {
            std::cout << "Intermediate process: Starting worker failed!" << std::endl;
            _exit(EXIT_FAILURE);
        }

        const pid_t timeoutPid = fork();
        if(timeoutPid == 0) {
            // timeout process
            std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl;
            sleep(timeoutInSeconds);
            std::cout << "Timeout process: Finished." << std::endl;
            _exit(EXIT_SUCCESS);
        } else if(-1 == timeoutPid) {
            std::cout << "Intermediate process: Starting timeout process failed." << std::endl;
            kill(workerPid, SIGKILL);
            std::cout << "Intermediate process: Finished." << std::endl;
            _exit(EXIT_FAILURE);
        }

        // ---------------------------------------
        // This code is only used for double checking if the worker is still running.
        // The if condition never evaluated to true in my tests.
        const auto killResult = kill(workerPid, 0);
        if((-1 == killResult) && (ESRCH == errno)) {
            std::cout << "Intermediate process: Worker is not running." << std::endl;
        }
        // ---------------------------------------

        std::cout << "Intermediate process: Waiting for child processes." << std::endl;
        int status = -1;
        const pid_t exitedPid = wait(&status);

        // ---------------------------------------
        // This code is only used for double checking if the worker is still running.
        // The if condition evaluates to true in the case of an error.
        const auto killResult2 = kill(workerPid, 0);
        if((-1 == killResult2) && (ESRCH == errno)) {
            std::cout << "Intermediate process: Worker is not running." << std::endl;
        }
        // ---------------------------------------

        std::cout << "Intermediate process: Child process finished. Status: " <<  status << "." << std::endl;
        if(exitedPid == workerPid) {
            std::cout << "Intermediate process: Killing timeout process." << std::endl;
            kill(timeoutPid, SIGKILL);
        } else {
            std::cout << "Intermediate process: Killing worker process." << std::endl;
            kill(workerPid, SIGKILL);
            std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl;
            wait(nullptr);
            std::cout << "Intermediate process: Finished." << std::endl;
            _exit(EXIT_FAILURE);
        }
        std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl;
        wait(nullptr);
        std::cout << "Intermediate process: Finished." << std::endl;
        _exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE);

    } else if(-1 == intermediatePid) {
        // error
        std::cout << "Parent process: Error starting intermediate process!" << std::endl;
        result = false;
    } else {
        // parent process
        std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl;
        processId = intermediatePid;
    }

    return(result);
}

bool waitForProcess(const pid_t processId) {
    int status = 0;
    const auto waitResult = waitpid(processId, &status, 0);
    auto result = false;
    if(waitResult == processId) {
        result = checkStatus(status);
    }
    return(result);
}

int main() {
    pid_t pid = 0;
    const char* const path = "/bin/ls";
    const char* argv[] = { "/bin/ls", "--help", nullptr };
    const unsigned int timeoutInS = 5;
    const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr);
    if(startResult) {
        const auto waitResult = waitForProcess(pid);
        std::cout << "waitForProcess returned " << waitResult << "." << std::endl;
    } else {
        std::cout << "startProcess failed!" << std::endl;
    }
}

修改

预期输出应包含

  • 中间过程:等待子过程.
  • 中间过程:子过程完成.状态:0.
  • 中间过程:终止超时过程.

在出现错误的情况下,输出如下所示

  • 中间过程:等待子过程.
  • 中间过程:子过程完成.状态:-1
  • 中间过程:杀死工作进程.

运行示例代码时,您很可能会看到预期的输出.在一个简单的示例中,我无法重现错误的结果.

解决方案

我发现了问题:

在函数mg_start中的猫鼬(JSON-RPC使用猫鼬)源中,我找到了以下代码

#if !defined(_WIN32) && !defined(__SYMBIAN32__)
  // Ignore SIGPIPE signal, so if browser cancels the request, it
  // won't kill the whole process.
  (void) signal(SIGPIPE, SIG_IGN);
  // Also ignoring SIGCHLD to let the OS to reap zombies properly.
  (void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32

(void) signal(SIGCHLD, SIG_IGN);

原因

如果父级执行了wait(),则仅当所有子级都退出后,此调用才会返回,然后将errno设置为ECHILD并返回-1."

此处 strong> 5.5 Voodoo:请稍候,并SIGCHLD .

在WAIT(2)的手册页中也对此进行了描述

错误[...]

ECHILD [...](这种情况可能发生在 如果将SIGCHLD的操作设置为SIG_IGN,则为自己的孩子. 另请参阅关于线程的Linux Notes部分.)

我很愚蠢,无法正确检查返回值. 尝试之前

if(exitedPid == workerPid) {

我应该检查exitedPid!= -1.

如果这样做的话,errno给了我ECHILD.如果我一开始就知道这一点,那么我会阅读手册页,并且可能会更快地发现问题...

顽皮的猫鼬只是在搞乱信号处理,无论应用程序想要做什么.另外,当用mg_stop停止时,猫鼬不会恢复对信号处理的更改.

其他信息: 导致此问题的代码已在2013年9月的猫鼬中更改为此提交... >

I wrote a helper function to start a process using fork() and execv() inspired by this answer. It is used to start e.g. mysqldump to make a database backup. The code works totally fine in a couple of different locations with different programs.

Now I hit one constellation where it fails: It is a call to systemctl to stop a unit. Running systemctl works, the unit is stopped. But in the intermediate process, when wait()ing for the child process, wait() hangs until the timeout process ends. If I check, if the worker process finished with kill(), I can tell that it did.

Important: The program does not misbehave or seg fault, besides that the wait() does not signal the end of the worker process! Is there anything in my code (see below) that is incorrect that could trigger that behavior? I've read Threads and fork(): think twice before mixing them but I cannot find anything in there that relates to my problem.

What's strange: Deep, deep, deep in the program JSON-RPC is used. If I deactivate the code using the JSON-RPC everything works fine!?

Environment: The program that uses the function is a multi-threaded application. Signals are blocked for all threads. The main threads handles signals via sigtimedwait().

Code (production code in which logging got traded for output via std::cout) with sample main function:

#include <iostream>

#include <unistd.h>
#include <sys/wait.h>

namespace {

bool checkStatus(const int status) {
    return( WIFEXITED(status) && ( WEXITSTATUS(status) == 0 ) );
}

}

bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) {
    auto result = true;

    const pid_t intermediatePid = fork();
    if(intermediatePid == 0) {
        // intermediate process
        std::cout << "Intermediate process: Started (" <<  getpid() << ")." << std::endl;
        const pid_t workerPid = fork();
        if(workerPid == 0) {
            // worker process
            if(fileDescriptor) {
                std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl;
                const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO);
                if(-1 == dupResult) {
                    std::cout << "Worker process: Duplication of file descriptor failed." << std::endl;
                    _exit(EXIT_FAILURE);
                }
            }
            execv(path, const_cast<char**>(argv));

            std::cout << "Intermediate process: Worker failed!" << std::endl;
            _exit(EXIT_FAILURE);
        } else if(-1 == workerPid) {
            std::cout << "Intermediate process: Starting worker failed!" << std::endl;
            _exit(EXIT_FAILURE);
        }

        const pid_t timeoutPid = fork();
        if(timeoutPid == 0) {
            // timeout process
            std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl;
            sleep(timeoutInSeconds);
            std::cout << "Timeout process: Finished." << std::endl;
            _exit(EXIT_SUCCESS);
        } else if(-1 == timeoutPid) {
            std::cout << "Intermediate process: Starting timeout process failed." << std::endl;
            kill(workerPid, SIGKILL);
            std::cout << "Intermediate process: Finished." << std::endl;
            _exit(EXIT_FAILURE);
        }

        // ---------------------------------------
        // This code is only used for double checking if the worker is still running.
        // The if condition never evaluated to true in my tests.
        const auto killResult = kill(workerPid, 0);
        if((-1 == killResult) && (ESRCH == errno)) {
            std::cout << "Intermediate process: Worker is not running." << std::endl;
        }
        // ---------------------------------------

        std::cout << "Intermediate process: Waiting for child processes." << std::endl;
        int status = -1;
        const pid_t exitedPid = wait(&status);

        // ---------------------------------------
        // This code is only used for double checking if the worker is still running.
        // The if condition evaluates to true in the case of an error.
        const auto killResult2 = kill(workerPid, 0);
        if((-1 == killResult2) && (ESRCH == errno)) {
            std::cout << "Intermediate process: Worker is not running." << std::endl;
        }
        // ---------------------------------------

        std::cout << "Intermediate process: Child process finished. Status: " <<  status << "." << std::endl;
        if(exitedPid == workerPid) {
            std::cout << "Intermediate process: Killing timeout process." << std::endl;
            kill(timeoutPid, SIGKILL);
        } else {
            std::cout << "Intermediate process: Killing worker process." << std::endl;
            kill(workerPid, SIGKILL);
            std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl;
            wait(nullptr);
            std::cout << "Intermediate process: Finished." << std::endl;
            _exit(EXIT_FAILURE);
        }
        std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl;
        wait(nullptr);
        std::cout << "Intermediate process: Finished." << std::endl;
        _exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE);

    } else if(-1 == intermediatePid) {
        // error
        std::cout << "Parent process: Error starting intermediate process!" << std::endl;
        result = false;
    } else {
        // parent process
        std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl;
        processId = intermediatePid;
    }

    return(result);
}

bool waitForProcess(const pid_t processId) {
    int status = 0;
    const auto waitResult = waitpid(processId, &status, 0);
    auto result = false;
    if(waitResult == processId) {
        result = checkStatus(status);
    }
    return(result);
}

int main() {
    pid_t pid = 0;
    const char* const path = "/bin/ls";
    const char* argv[] = { "/bin/ls", "--help", nullptr };
    const unsigned int timeoutInS = 5;
    const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr);
    if(startResult) {
        const auto waitResult = waitForProcess(pid);
        std::cout << "waitForProcess returned " << waitResult << "." << std::endl;
    } else {
        std::cout << "startProcess failed!" << std::endl;
    }
}

Edit

The expected output should contain

  • Intermediate process: Waiting for child processes.
  • Intermediate process: Child process finished. Status: 0.
  • Intermediate process: Killing timeout process.

In the case of error the output looks like this

  • Intermediate process: Waiting for child processes.
  • Intermediate process: Child process finished. Status: -1
  • Intermediate process: Killing worker process.

When you run the sample code you will most likely see the expected output. I cannot reproduce the incorrect result in a simple example.

解决方案

I found the problem:

Within the mongoose (JSON-RPC uses mongoose) sources in the function mg_start I found the following code

#if !defined(_WIN32) && !defined(__SYMBIAN32__)
  // Ignore SIGPIPE signal, so if browser cancels the request, it
  // won't kill the whole process.
  (void) signal(SIGPIPE, SIG_IGN);
  // Also ignoring SIGCHLD to let the OS to reap zombies properly.
  (void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32

(void) signal(SIGCHLD, SIG_IGN);

causes that

if the parent does a wait(), this call will return only when all children have exited, and then returns -1 with errno set to ECHILD."

as mentioned here in the section 5.5 Voodoo: wait and SIGCHLD.

This is also described in the man page for WAIT(2)

ERRORS [...]

ECHILD [...] (This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the Linux Notes section about threads.)

Stupid on my part not to check the return value correctly. Before trying

if(exitedPid == workerPid) {

I should have checked that exitedPid is != -1.

If I do so errno gives me ECHILD. If I would have known that in the first place, I would have read the man page and probably found the problem faster...

Naughty of mongoose just to mess with signal handling no matter what an application wants to do about it. Additionally mongoose does not revert the altering of signal handling when being stopped with mg_stop.

Additional info: The code that caused this problem was changed in mongoose in September 2013 with this commit.

这篇关于Linux:fork&amp; execv,等待子进程挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆