Shell 脚本读取缺少最后一行 [英] Shell script read missing last line

查看:104
本文介绍了Shell 脚本读取缺少最后一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 ... bash shell 脚本的奇怪问题,我希望能对此有所了解.

I have an ... odd issue with a bash shell script that I was hoping to get some insight on.

我的团队正在编写一个脚本,该脚本遍历文件中的行并检查每一行中的内容.我们有一个错误,当通过将不同脚本排序在一起的自动化流程运行时,看不到最后一行.

My team is working on a script that iterates through lines in a file and checks for content in each one. We had a bug where, when run via the automated process that sequences different scripts together, the last line wasn't being seen.

用于遍历文件中的行的代码(存储在 DATAFILE 中的名称是

The code used to iterate over the lines in the file (name stored in DATAFILE was

cat "$DATAFILE" | while read line 

我们可以从命令行运行脚本,它会看到文件中的每一行,包括最后一行,就好了.但是,当由自动化进程运行时(它运行在相关脚本之前生成 DATAFILE 的脚本),永远不会看到最后一行.

We could run the script from the command line and it would see every line in the file, including the last one, just fine. However, when run by the automated process (which runs the script that generates the DATAFILE just prior to the script in question), the last line is never seen.

我们更新了代码以使用以下代码来迭代这些行,问题就解决了:

We updated the code to use the following to iterate over the lines, and the problem cleared up:

for line in `cat "$DATAFILE"` 

注意:DATAFILE 没有在文件末尾写入换行符.

我的问题有两部分...为什么原始代码看不到最后一行,为什么这会有所不同?

My question is two part... Why would the last line not be seen by the original code, and why this would change make a difference?

我只是想我能想出为什么看不到最后一行是:

I only thought I could come up with as to why the last line would not be seen was:

  • 上一个写入文件的进程依赖于进程结束以关闭文件描述符.
  • 问题脚本启动和打开文件的速度足够快,虽然前一个进程已经结束",但它没有关闭/清理"足够系统自动关闭文件描述符.

话虽如此,似乎如果你在一个 shell 脚本中有 2 个命令,第一个应该在脚本运行第二个时完全关闭.

That being said, it seems like, if you have 2 commands in a shell script, the first one should be completely shut down by the time the script runs the second one.

对问题的任何见解,尤其是第一个问题,将不胜感激.

Any insight into the questions, especially the first one, would be very much appreciated.

推荐答案

C 标准规定文本文件必须以换行符结尾,否则可能无法正确读取最后一个换行符之后的数据.

The C standard says that text files must end with a newline or the data after the last newline may not be read properly.

ISO/IEC 9899:2011 §7.21.2 流

ISO/IEC 9899:2011 §7.21.2 Streams

文本流是由字符组成的有序序列组成的行,每一行由零个或多个字符加上终止的换行符组成.是否最后一行需要一个终止换行符是实现定义的.人物可能必须在输入和输出中添加、更改或删除以符合不同的在宿主环境中表示文本的约定.因此,不需要一对一流中的字符与外部字符之间的一种对应关系表示.从文本流中读取的数据必然与数据相等仅在以下情况下才早期写入该流: 数据仅包含打印字符和控制字符水平制表符和换行符;没有换行符是紧跟在空格字符之前;最后一个字符是换行符.是否在换行符之前立即写出空格字符当读入是实现定义时出现.

A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to- one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. Whether space characters that are written out immediately before a new-line character appear when read in is implementation-defined.

我没想到文件末尾缺少换行符会导致 bash(或任何 Unix shell)出现问题,但这似乎是可重现的问题($ 是此输出中的提示):

I would not have expected a missing newline at the end of file to cause trouble in bash (or any Unix shell), but that does seem to be the problem reproducibly ($ is the prompt in this output):

$ echo xxx\c
xxx$ { echo abc; echo def; echo ghi; echo xxx\c; } > y
$ cat y
abc
def
ghi
xxx$
$ while read line; do echo $line; done < y
abc
def
ghi
$ bash -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ ksh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ zsh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ for line in $(<y); do echo $line; done      # Preferred notation in bash
abc
def
ghi
xxx
$ for line in $(cat y); do echo $line; done   # UUOC Award pending
abc
def
ghi
xxx
$

它也不限于 bash — Korn shell (ksh) 和 zsh 的行为也是如此.我生活,我学习;感谢您提出问题.

It is also not limited to bash — Korn shell (ksh) and zsh behave like that too. I live, I learn; thanks for raising the issue.

如上面的代码所示,cat 命令读取整个文件.`cat $DATAFILE` 中的 for 行 技术收集所有输出并用一个空格替换任意的空格序列(我得出结论,文件中的每一行都不包含空格).

As demonstrated in the code above, the cat command reads the whole file. The for line in `cat $DATAFILE` technique collects all the output and replaces arbitrary sequences of white space with a single blank (I conclude that each line in the file contains no blanks).

在 Mac OS X 10.7.5 上测试.

Tested on Mac OS X 10.7.5.

POSIX read 命令规范说:

The POSIX read command specification says:

read 实用程序应从标准输入中读取一行.

The read utility shall read a single line from standard input.

默认情况下,除非指定了 -r 选项,应作为转义字符.一个未转义的 <backslash>应保留以下字符的字面值,<换行符> 除外.如果一个 <newline>在 <backslash> 之后,读取实用程序应将其解释为行继续.<反斜杠>和 应在将输入拆分为字段之前删除.所有其他未转义的 <backslash>将输入拆分为字段后,应删除字符.

By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.

如果标准输入是终端设备并且调用shell是交互式的,当它读取以<反斜杠>结尾的输入行时,read将提示输入续行;<newline>,除非指定了 -r 选项.

If standard input is a terminal device and the invoking shell is interactive, read shall prompt for a continuation line when it reads an input line ending with a <backslash> <newline>, unless the -r option is specified.

终止<换行符>(如果有) 应从输入中删除,结果应像在 shell 中一样拆分为字段以获取参数扩展的结果(请参阅字段拆分);[...]

The terminating <newline> (if any) shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Field Splitting); [...]

注意'(如果有)'(引号中加了重点)!在我看来,如果没有换行符,它仍然应该读取结果.另一方面,它还说:

Note that '(if any)' (emphasis added in quote)! It seems to me that if there is no newline, it should still read the result. On the other hand, it also says:

标准输入

标准输入应该是一个文本文件.

The standard input shall be a text file.

然后你回到关于不以换行符结尾的文件是否是文本文件的争论.

and then you get back to the debate about whether a file that does not end with a newline is a text file or not.

不过,同页文件的原理:

However, the rationale on the same page documents:

虽然要求标准输入是文本文件,因此总是以<newline>结尾.(除非它是一个空文件),当不使用 -r 选项时对连续行的处理可能导致输入不以 <newline> 结尾.如果输入文件的最后一行以 <backslash> 结尾,则会发生这种情况.<换行符>.正是出于这个原因,如果有的话"用于终止<换行符>(如果有)应从输入中删除"在说明中.标准输入为文本文件的要求并没有放宽.

Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the -r option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that "if any" is used in "The terminating <newline> (if any) shall be removed from the input" in the description. It is not a relaxation of the requirement for standard input to be a text file.

这个理由必须意味着文本文件应该以换行符结尾.

That rationale must mean that the text file is supposed to end with a newline.

文本文件的 POSIX 定义是:

The POSIX definition of a text file is:

3.395 文本文件

包含组织成零行或多行字符的文件.这些行不包含 NUL 字符并且长度不能超过 {LINE_MAX} 个字节,包括 <newline>特点.尽管 POSIX.1-2008 不区分文本文件和二进制文件(参见 ISO C 标准),但许多实用程序仅在对文本文件进行操作时产生可预测或有意义的输出.具有此类限制的标准实用程序总是指定文本文件".在他们的标准输入或输入文件部分.

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

这并没有规定'以<换行符>结束'直接,但确实遵循 C 标准,并且确实说包含组织成零个或多个的字符的文件";当我们查看线"的 POSIX 定义时它说:

This does not stipulate 'ends with a <newline>' directly, but does defer to the C standard and it does say "A file that contains characters organized into zero or more lines" and when we look at the POSIX definition of a "Line" it says:

3.206线

零个或多个非<换行符>的序列字符加一个终止<换行符>字符.

A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

因此,根据 POSIX 定义,文件必须以终止换行符结尾,因为它由多行组成,每行都必须以终止换行符结尾.

so per the POSIX definition a file must end in a terminating newline because it's made up of lines and each line must end in a terminating newline.

注意戈登戴维森答案.一个简单的测试表明他的观察是准确的:

Note Gordon Davisson's answer. A simple test shows that his observation is accurate:

$ while read line; do echo $line; done < y; echo $line
abc
def
ghi
xxx
$

因此,他的技术:

while read line || [ -n "$line" ]; do echo $line; done < y

或:

cat y | while read line || [ -n "$line" ]; do echo $line; done

适用于末尾没有换行符的文件(至少在我的机器上).

will work for files without a newline at the end (at least on my machine).

我仍然很惊讶地发现 shell 删除了输入的最后一段(它不能被称为一行,因为它没有以换行符结尾),但在 POSIX 中可能有足够的理由来做所以.很明显,最好确保您的文本文件确实是以换行符结尾的文本文件.

I'm still surprised to find that the shells drop the last segment (it can't be called a line because it doesn't end with a newline) of the input, but there might be sufficient justification in POSIX to do so. And clearly it is best to ensure that your text files really are text files ending with a newline.

这篇关于Shell 脚本读取缺少最后一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆