"lseek"如何帮助确定文件是否为空? [英] How does `lseek` help determine whether a file is empty?

查看:73
本文介绍了"lseek"如何帮助确定文件是否为空?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看源代码来自GNU coreutils的 cat ,特别是圆圈检测.他们正在比较设备和inode,并且工作正常,但是,如果输入为空,则有多余的情况,它们允许将输出作为输入.查看代码,这必须是 lseek(input_desc,0,SEEK_CUR)<stat_buf.st_size)部分.我阅读了手册页和从中找到的讨论git blame ,但我仍然不完全理解为什么需要对 lseek 的调用.

I am looking at the source code of cat from the GNU coreutils, in particular the circle detection. They are comparing device and inode and that works fine, there is however an extra case where they allow the output to be an input, if the input is empty. Looking at the code, this must the lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size) part. I read the manpages and a discussion that I found from git blame, but I still cannot quite understand why this call to lseek is needed.

这是 cat 进行检测的要点,如果它可以无限耗尽磁盘(请注意,为简便起见,还删除了一些错误检查,以上链接了完整的源代码):

This is the gist of how cat detects, if it would infinitely exhaust the disk (note that some error checks have also been removed for brevity, the full source code is linked above):

struct stat stat_buf;
fstat(STDOUT_FILENO, &stat_buf);
out_dev = stat_buf.st_dev;
out_ino = stat_buf.st_ino;
out_isreg = S_ISREG (stat_buf.st_mode) != 0;

// ...
// for <infile> in inputs {
    input_desc = open (infile, file_open_mode); // or STDIN_FILENO
    fstat(input_desc, &stat_buf);
    /* Don't copy a nonempty regular file to itself, as that would
       merely exhaust the output device.  It's better to catch this
       error earlier rather than later.  */
    if (out_isreg 
        && stat_buf.st_dev == out_dev && stat_buf.st_ino == out_ino
        && lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size)         // <--- This is the important line
    {
      // ...
    }
// } (end of for)

我有两种可能的解释,但两者似乎都有些奇怪.

I have two possible explanations, but both seem kind of weird.

  1. 文件可能为空",根据一些标准(posix),尽管它仍然包含一些信息(以 st_size 进行计数),并且 lseek open 通过偏移一些默认值.我不知道为什么会这样,因为空意味着空,对吗?
  2. 这种比较实际上是一个聪明"的选择.两个条件的组成.首先,这对我来说很有意义,因为如果 input_desc STDIN_FILENO ,并且没有文件通过管道传输到 stdin ,则 lseek会失败,并显示 ESPIPE (根据手册页),并返回 -1 .然后,整个语句将是 lseek(...)== -1 ||stat_buf.st_size>0 .但这不能成立,因为只有在设备和inode相同的情况下才会执行此检查,并且只有在a)stdin和stdout指向相同的pty时才会发生,但是 out_isreg 将是false 或b)stdin和stdout指向同一个文件,但是 lseek 无法返回 -1 ,对吧?
  1. A file could be "empty" according to some standard (posix) although it still contains some information (that is counted with st_size) and lseek or open respects that by offsetting by some default. I wouldn't know why this would be the case, because empty means empty, right?
  2. This comparison is really a "clever" composition of two conditions. This made sense to me first, because if input_desc would be STDIN_FILENO and there wouldn't be a file piped to stdin, lseek would fail with ESPIPE (according to the man page) and return -1. Then, this whole statement would be lseek(...) == -1 || stat_buf.st_size > 0. But this cannot be true, because this check only happens if device and inode are the same and that can only happen if a) stdin and stdout are pointing to same pty, but then out_isreg would be false or b) stdin and stdout point to the same file, but then lseek cannot return -1, right?

我还整理了一个小程序,可以打印出重要部分的返回值和 errno ,但是对我来说没有什么特别的:

I have also put together a small program that prints out the return values and errno for the important parts, but there was nothing standing out to me:

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <unistd.h>

int main(int argc, char **argv) {
  struct stat out_stat;
  struct stat in_stat;

  if (fstat(STDOUT_FILENO, &out_stat) < 0)
    exit(1);

  printf("this is written to stdout / into the file\n");

  int fd;
  if (argc > 1)
    fd = open(argv[1], O_RDONLY);
  else
    fd = STDIN_FILENO;

  fstat(fd, &in_stat);
  int res = lseek(fd, 0, SEEK_CUR);
  fprintf(stderr,
          "errno after lseek = %d, EBADF = %d, EINVAL = %d, EOVERFLOW = %d, "
          "ESPIPE = %d\n",
          errno, EBADF, EINVAL, EOVERFLOW, ESPIPE);

  fprintf(stderr, "input:\n\tlseek(...) = %d\n\tst_size = %ld\n", res,
          in_stat.st_size);

  printf("outsize is %ld", out_stat.st_size);
}

$ touch empty
$ ./a.out < empty > empty
errno after lseek = 0, EBADF = 9, EINVAL = 22, EOVERFLOW = 75, ESPIPE = 29
input:
        lseek(...) = 0
        st_size = 0
$ echo x > empty
$ ./a.out < empty > empty
errno after lseek = 0, EBADF = 9, EINVAL = 22, EOVERFLOW = 75, ESPIPE = 29
input:
        lseek(...) = 0
        st_size = 0

因此,我的研究没有涉及我的最终问题:如何通过 cat 源代码帮助 lseek 确定该示例中的文件是否为空?

So my ultimate question is untouched from my research: How does lseek help determine if a file is empty in this example from the cat source code?

推荐答案

这是我的反向工程尝试-我找不到任何公开的讨论来解释为什么将 lseek()放在此处(GNU coreutils中没有其他地方可以做到这一点.)

This is my attempt at reverse-engineering this - I could not find any public discussion that explains why lseek() was put there (no other place in GNU coreutils does that).

指导性问题是:条件 lseek(input_desc,0,SEEK_CUR)是什么时候<stat_buf.st_size 为假?

The guiding question is: When is the condition lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size false?

测试用例:

#!/bin/bash
# (edited based on comments)

set -x

# arrange for cat to start off past the end of a non-empty file

echo abcdefghi > /tmp/so/catseek/input
# get the shell to open the input file for reading & writing as file descriptor 7
exec 7<>/tmp/so/catseek/input
# read the whole file via that descriptor (but leave it open)
dd <&7
# ask linux what the current file position of file descriptor 7 is
# should be everything dd read, namely 10 bytes, the size of the file
grep ^pos: /proc/self/fdinfo/7
# run cat, with pre and post content so that we know how to locate the interesting part
# "-" will cause cat to reuse its file descriptor 0 rather than creating a new file descriptor
# the redirections tell the shell to redirect file descriptors 1 and 0 to/from our open file descriptor 7
# which, as you'll remember, already has a file position of 10 bytes
strace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/post <&7 >&7
# now let's see what's in the file
cat /tmp/so/catseek/input

使用:

$ cat /tmp/so/catseek/pre
pre
$ cat /tmp/so/catseek/post
post

cat lseek(input_desc,0,SEEK_CUR)<stat_buf.st_size :

+ test.sh:8:echo abcdefghi
+ test.sh:10:exec
+ test.sh:12:dd
abcdefghi
0+1 records in
0+1 records out
10 bytes copied, 2.0641e-05 s, 484 kB/s
+ test.sh:15:grep '^pos:' /proc/self/fdinfo/7
pos:    10
+ test.sh:20:strace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/post
lseek(0, 0, SEEK_CUR)                   = 14
+++ exited with 0 +++
+ test.sh:22:cat /tmp/so/catseek/input
abcdefghi
pre
post

cat ,其中 0<stat_buf.st_size :

+ test.sh:8:echo abcdefghi
+ test.sh:10:exec
+ test.sh:12:dd
abcdefghi
0+1 records in
0+1 records out
10 bytes copied, 3.6415e-05 s, 275 kB/s
+ test.sh:15:grep '^pos:' /proc/self/fdinfo/7
pos:    10
+ test.sh:20:strace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/post
./src/cat: -: input file is output file
+++ exited with 1 +++
+ test.sh:22:cat /tmp/so/catseek/input
abcdefghi
pre
post

如您所见,当 cat 开始时,文件位置可能已经在文件末尾之后,并且仅检查文件大小将使 cat 跳过文件,但也会触发失败,因为 if 语句中的代码是:

As you can see, when cat starts, the file position may already be after the end-of-file, and checking just the file size will make cat skip the file, but also trigger a failure, as the code inside the if statement is:

error (0, 0, _("%s: input file is output file"), infile);
ok = false;
goto contin;

使用 lseek()允许 cat 说哦,文件是相同的,并且不是空的,但是我们的读取仍然会变成空的,因为这就是读取EOF的工作原理,因此我们可以允许这种情况".

Using lseek() allows cat to say "Oh, the file is the same, and is not empty, BUT our reads will still turn up empty, because that's how reading past EOF works, so we can allow this case".

这篇关于"lseek"如何帮助确定文件是否为空?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆