shell脚本读取缺少最后一行 [英] Shell script read missing last line

查看:947
本文介绍了shell脚本读取缺少最后一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有,我希望能得到一些见解一个bash shell脚本的......奇怪的问题。

I have an ... odd issue with a bash shell script that I was hoping to get some insight on.

我的团队正在开发一个脚本,通过在一个文件中的线条和检查每一个内容迭代。我们有在何处,何时通过自动化程序序列不同的脚本在一起,没有被看到最后一行运行错误。

My team is working on a script that iterates through lines in a file and checks for content in each one. We had a bug where, when run via the automated process that sequences different scripts together, the last line wasn't being seen.

用于遍历文件中的行(名称保存在 DATAFILE

The code used to iterate over the lines in the file (name stored in DATAFILE was

cat "$DATAFILE" | while read line 

我们可以运行在命令行脚本,它会看到文件中的每一行,包括最后一个,就好了。然而,当由自动化的进程中运行(运行,只是有问题的脚本之前生成的数据文件中的脚本),最后一行是从未见过。

We could run the script from the command line and it would see every line in the file, including the last one, just fine. However, when run by the automated process (which runs the script that generates the DATAFILE just prior to the script in question), the last line is never seen.

我们更新了code使用下面的遍历线,问题清理:

We updated the code to use the following to iterate over the lines, and the problem cleared up:

for line in `cat "$DATAFILE"` 

注:DATAFILE没有换行有史以来在文件的结尾

我的问题是两部分......为什么会在最后一行不是原来的code可以看出,为什么这会改变有所作为?

My question is two part... Why would the last line not be seen by the original code, and why this would change make a difference?

我只想到我能想出至于为什么最后一行将不被视为是:

I only thought I could come up with as to why the last line would not be seen was:


  • 的previous过程,它写到文件中,是依靠进程结束,关闭文件描述符。

  • 问题剧本是启动和打开文件之前,速度不够快,而previous进程已经结束,它没有关闭/清理该系统足以关闭文件描述符自动它。

话虽这么说,好像,如果你在一个shell脚本有2个命令,第一个应该被完全脚本运行的第二个时间关闭。

That being said, it seems like, if you have 2 commands in a shell script, the first one should be completely shut down by the time the script runs the second one.

任何洞察问题,特别是第一个,将非常AP preciated。

Any insight into the questions, especially the first one, would be very much appreciated.

推荐答案

C标准说,文本文件必须以新行或数据结束后最后一个换行符可能无法正确​​读取。

The C standard says that text files must end with a newline or the data after the last newline may not be read properly.

ISO / IEC 9899:2011§7.21.2流

ISO/IEC 9899:2011 §7.21.2 Streams

一个文本流字符的有序排列组合成行,每行
  由零个或多个字符加上一个终止换行符。是否
  最后一行需要一个终止换行符是实现定义的。人物
  可能必须添加,改变,或输入和输出中删除,以符合不同
  约定在主机环境中重新presenting文本。因此,不必是一对一
  在一个流和字符之间一一对应那些在外部
  再presentation。数据从文本流中读取在将必然比较等于数据
  这是较早写出到流只有:该数据只包括印刷
  字符和控制字符水平制表符和换行;没有新行字符
  立即美元的空格字符pceded p $;而最后一个字符是换行字符。
  无论被换行字符前立即写出来的空格字符
  在阅读时出现是实现定义的。

A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to- one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. Whether space characters that are written out immediately before a new-line character appear when read in is implementation-defined.

我就不会在文件的结尾意外丢失的换行符引起庆典麻烦(或任何Unix外壳),但是这似乎是问题的重复性( $ 在这个输出提示):

I would not have unexpected a missing newline at the end of file to cause trouble in bash (or any Unix shell), but that does seem to be the problem reproducibly ($ is the prompt in this output):

$ echo xxx\\c
xxx$ { echo abc; echo def; echo ghi; echo xxx\\c; } > y
$ cat y
abc
def
ghi
xxx$
$ while read line; do echo $line; done < y
abc
def
ghi
$ bash -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ ksh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ zsh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ for line in $(<y); do echo $line; done      # Preferred notation in bash
abc
def
ghi
xxx
$ for line in $(cat y); do echo $line; done   # UUOC Award pending
abc
def
ghi
xxx
$

它也不仅限于庆典 - Korn shell提示符( KSH )和的zsh 那样做了。我还活着,我学习;感谢您提出这个问题。

It is also not limited to bash — Korn shell (ksh) and zsh behave like that too. I live, I learn; thanks for raising the issue.

如上code证明,在命令读取整个文件。在在`$猫行DATAFILE` 技术收集所有输出,并替换为一个单一的空白(我总结的空白任意序列,该文件中的每一行包含没有空格)。

As demonstrated in the code above, the cat command reads the whole file. The for line in `cat $DATAFILE` technique collects all the output and replaces arbitrary sequences of white space with a single blank (I conclude that each line in the file contains no blanks).

测试在Mac OS X 10.7.5。

Tested on Mac OS X 10.7.5.

在POSIX 命令规范说:

The POSIX read command specification says:

读程序应当从标准输入读取一行。

The read utility shall read a single line from standard input.

在默认情况下,除非 -r 选项指定,&LT;&反斜线GT;应作为转义字符行事。一个转义&LT;&反斜线GT;应preserve以下字符的字面意义,A&LT除外;新行取代。如果一个与所述;换行符&GT;下面的&lt;反斜杠&gt;中读程序应相互preT这是续行。在&lt;&反斜线GT;和&LT;&换行符GT; 应拆分输入字段前去除。所有其他的转义&LT;&反斜线GT;字符应拆分输入字段后移除。

By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.

如果标准输入是终端设备,并调用shell是交互式的,阅读应续行提示时,它读取与A&LT结尾的输入线;反斜线&GT; &LT;换行符&gt;中,除非指定了 -r 选项

If standard input is a terminal device and the invoking shell is interactive, read shall prompt for a continuation line when it reads an input line ending with a <backslash> <newline>, unless the -r option is specified.

端接&LT;&换行符GT; (如果有的话)应从输入被删除,结果应分成作为外壳的参数扩展(见场分裂)的结果领域; [...]

The terminating <newline> (if any) shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Field Splitting); [...]

注意'(如​​果有的话)(引号中的强调)!在我看来,如果没有换行,仍应读取结果。另一方面,它也表示:

Note that '(if any)' (emphasis added in quote)! It seems to me that if there is no newline, it should still read the result. On the other hand, it also says:

STDIN

标准输入应是一个文本文件。

The standard input shall be a text file.

,然后你会得到约不以换行符结尾的文件是否是一个文本文件或不辩论。

and then you get back to the debate about whether a file that does not end with a newline is a text file or not.

然而,在同一页文档的理由:

However, the rationale on the same page documents:

虽然标准输入要求是一个文本文件,并因此将始终以与所述结束;换行符&GT; (除非它是一个空文件),当 -r 选项不使用可能导致输入,A&LT没有结束连续行的处理;新行取代。这发生如果输入文件的最后一行用处结束,反斜线&GT; &所述;换行符取代。正是由于这个原因,如果任何在使用中的描述的终端与下;换行符&GT(如果有的话)应当从输入被删除。这不是标准输入是一个文本文件的要求放宽。

Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the -r option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that "if any" is used in "The terminating <newline> (if any) shall be removed from the input" in the description. It is not a relaxation of the requirement for standard input to be a text file.

这理由一定意味着该文本文件应该以换行符结束。

That rationale must mean that the text file is supposed to end with a newline.

一个文本文件的POSIX定义是:

The POSIX definition of a text file is:

3.395 文本文件

这包含组织成零个或多个字符行的文件。该行不包含NULL字符无人能超过{LINE_MAX}字节的长度,包括在&lt;&换行符GT;字符。虽然POSIX.1-2008不文本文件和二进制文件(见ISO C标准)进行区分,许多公用事业上的文本文件进行操作时只产生predictable或有意义的输出。标准的实用程序有这样的限制,总是在他们的标准输入或输入文件的部分指定为文本文件。

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

这并没有规定'结尾的&LT;&换行符GT;'直接,但推迟到C标准。

This does not stipulate 'ends with a <newline>' directly, but does defer to the C standard.

请注意戈登戴维森的<一个href=\"http://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line/12919766#12919766\">answer.一个简单的测试显示,他的观察是准确的:

Note Gordon Davisson's answer. A simple test shows that his observation is accurate:

$ while read line; do echo $line; done < y; echo $line
abc
def
ghi
xxx
$

因此​​,他的技巧:

Therefore, his technique of:

while read line || [ -n "$line" ]; do echo $line; done < y

cat y | while read line || [ -n "$line" ]; do echo $line; done

在结束(至少在我的机器上)的文件都无需换行。

will work for files without a newline at the end (at least on my machine).

我还是惊讶地发现,炮弹砸在最后一段(它不能被称为一个行,因为它不以新行结束)输入的,但可能有足够的理由在POSIX做所以。并明确这是最好的,以确保您的文本文件真的是一个换行符结尾的文本文件。

I'm still surprised to find that the shells drop the last segment (it can't be called a line because it doesn't end with a newline) of the input, but there might be sufficient justification in POSIX to do so. And clearly it is best to ensure that your text files really are text files ending with a newline.

这篇关于shell脚本读取缺少最后一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆