使用fork()时,getline()重复读取文件 [英] getline() is repeatedly reading the file, when fork() is used

查看:88
本文介绍了使用fork()时,getline()重复读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个简单的shell程序,一个命令行解释器,并且想逐行从文件中读取输入,因此我使用了getline()函数.第一次,该程序正常运行,但是,当到达文件末尾而不是终止时,该程序从头开始读取文件并且无限运行.以下是主要函数中与getline()相关的一些代码:

I am developing a simple shell program, a command line interpreter and I wanted to read input from the file line by line, so I used getline() function. At the first time, the program works correctly, however, when it reaches the end of the file, instead of terminating, it starts to read a file from the start and it runs infinitely. Here are some codes in main function that are related to getline():

int main(int argc,char *argv[]){
    int const IN_SIZE = 255;
    char *input = NULL;
    size_t len = IN_SIZE;
    // get file address
    fileAdr = argv[2];

    // open file
    srcFile = fopen(fileAdr, "r");

    if (srcFile == NULL) {
        printf("No such file!\n");
        exit(-1);
    }

    while (getline( &input, &len, srcFile) != -1) {
        strtok(input, "\n");
        printf("%s\n", input);
        // some code that parses input, firstArgs == input
        execSimpleCmd(firstArgs);            
    }
    fclose(srcFile);
}

我在我的程序中使用了 fork(),很可能它会导致这个问题.

I am using fork() in my program and most probably it causes this problem.

void execSimpleCmd(char **cmdAndArgs) {

    pid_t pid = fork();
    if (pid < 0) {
        // error
        fprintf(stderr, "Fork Failed");
        exit(-1);
    } else if (pid == 0) {
        // child process
        if (execvp(cmdAndArgs[0], cmdAndArgs) < 0) {
            printf("There is no such command!\n");
        }
        exit(0);
    } else {
        // parent process
        wait(NULL);
        return;
    }
}

此外,有时程序会读取并打印多行的组合.例如,如果输入文件如下:

In addition, sometimes the program reads and prints a combinations of multiple lines. For example, if an input file as below:

ping
ww    
ls
ls -l
pwd

它会打印pwdg,pwdww等内容.如何解决?

it prints something like pwdg, pwdww, etc. How to fix it?

推荐答案

在某些情况下,关闭 FILE 似乎会寻找底层文件描述符,从而有效地回到应用程序实际读取的位置消除读缓冲的影响.这很重要,因为父级和子级的OS级别文件描述符指向相同的文件描述,尤其是指向相同的文件偏移量.

It appears that closing a FILE in some cases seeks the underlying file descriptor back to the position where the application actually read to, effectively undoing the effect of the read buffering. This matters, since the OS level file descriptors of the parent and the child point to the same file description, and the same file offset in particular.

fclose()的POSIX描述具有以下短语:

The POSIX description of fclose() has this phrase:

[CX] [Option Start]如果该文件尚未在EOF上,并且该文件可以搜索,则基础打开文件描述的文件偏移应设置为流的文件位置.(如果流是基础文件描述的活动句柄).

[CX] [Option Start] If the file is not already at EOF, and the file is one capable of seeking, the file offset of the underlying open file description shall be set to the file position of the stream if the stream is the active handle to the underlying file description.

(其中 CX表示对ISO C标准的扩展,当然 exit()在所有流上都运行 fclose().)

(Where CX means an extension to the ISO C standard, and exit() of course runs fclose() on all streams.)

我可以使用该程序(在Debian 9.8上)重现奇怪的行为:

I can reproduce the odd behavior with this program (on Debian 9.8):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char *argv[]){
    FILE *f;
    if ((f = fopen("testfile", "r")) == NULL) {
        perror("fopen");
        exit(1);
    }

    int right = 0;
    if (argc > 1)
        right = 1;

    char *line = NULL;
    size_t len = 0;
    // first line 
    getline(&line, &len, f);
    printf("%s", line);

    pid_t p = fork();
    if (p == -1) {
        perror("fork");
    } else if (p == 0) {
        if (right)
            _exit(0);  // exit the child 
        else
            exit(0);   // wrong way to exit
    } else {
        wait(NULL);  // parent
    }

    // rest of the lines
    while (getline(&line, &len, f) > 0) {
        printf("%s", line);
    }

    fclose(f);
}

然后:

$ printf 'a\nb\nc\n' > testfile
$ gcc -Wall -o getline getline.c
$ ./get
getline   getline2  
$ ./getline
a
b
c
b
c

使用 strace -f ./getline 运行它可以清楚地表明子级正在向后寻找文件描述符:

Running it with strace -f ./getline clearly shows the child seeking the file descriptor back:

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f63794e0710) = 25117
strace: Process 25117 attached
[pid 25116] wait4(-1,  <unfinished ...>
[pid 25117] lseek(3, -4, SEEK_CUR)      = 2
[pid 25117] exit_group(1)               = ?

(我没有看到使用不涉及分叉的代码进行搜索,但是我不知道为什么.)

(I didn't see the seek back with a code that didn't involve forking, but I don't know why.)

因此,发生的情况是主程序上的C库从文件中读取了一块数据,然后应用程序打印了第一行.在派生之后,孩子退出,并寻找fd返回到应用程序级文件指针所在的位置.然后,父级继续,处理其余的读取缓冲区,并在完成后继续从文件读取.因为文件描述符被找回,从第二个开始的行再次可用.

So, what happens is that the C library on the main program reads a block of data from the file, and the application prints the first line. After the fork, the child exits, and seeks the fd back to where the application level file pointer is. Then the parent continues, processes the rest of the read buffer, and when it's finished, it continues reading from the file. Because the file descriptor was seeked back, the lines starting from the second are again available.

在您的情况下,每次迭代中重复的 fork()似乎会导致无限循环.

In your case, the repeated fork() on every iteration seems to result in an infinite loop.

在子进程中使用 _exit() 而不是 exit() 可以解决问题在这种情况下,因为 _exit()仅退出该过程,不使用stdio缓冲区进行任何内务处理.

Using _exit() instead of exit() in the child fixes the problem in this case, since _exit() only exits the process, it doesn't do any housekeeping with the stdio buffers.

使用 _exit(),也不会刷新任何输出缓冲区,因此您需要在 stdout 上手动调用 fflush()以及您要写入的任何其他文件.

With _exit(), any output buffers are also not flushed, so you'll need to call fflush() manually on stdout and any other files you're writing to.

但是,如果您以相反的方式进行操作,并且孩子读取和缓冲的内容多于其处理的内容,那么对于孩子来说,找回fd很有用,以便父母可以从孩子实际离开的地方继续

However, if you did this the other way around, with the child reading and buffering more than it processes, then it would be useful for the child to seek back the fd so that the parent could continue from where the child actually left.

另一种解决方案是不将 stdio fork()混合.

Another solution would be not to mix stdio with fork().

这篇关于使用fork()时,getline()重复读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆