为什么我的工具输出会覆盖自身,如何解决? [英] Why does my tool output overwrite itself and how do I fix it?

查看:127
本文介绍了为什么我的工具输出会覆盖自身,如何解决?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题的目的是为日常问题提供答案,答案为您有DOS行结尾",因此我们可以简单地将其作为该行的重复内容而关闭,而无需重复相同的答案令人作呕. em>.

The intent of this question is to provide an answer to the daily questions whose answer is "you have DOS line endings" so we can simply close them as duplicates of this one without repeating the same answers ad nauseam.

注意:这不是任何现有问题的重复.该问题与解答的目的不仅在于提供运行此工具"的答案,还在于解释问题,以便我们可以在此处指出具有相关问题的任何人,他们会在这里找到为什么要指出他们的明确解释.以及运行工具,从而解决了他们的问题.我花了数小时阅读所有现有的问答,而他们都缺少对该问题的解释,可以用来解决该问题的替代工具和/或可能的解决方案的优缺点.他们中的一些人也接受了答案,这些答案简直是危险的,永远不要使用.

NOTE: This is NOT a duplicate of any existing question. The intent of this Q&A is not just to provide a "run this tool" answer but also to explain the issue such that we can just point anyone with a related question here and they will find a clear explanation of why they were pointed here as well as the tool to run so solve their problem. I spent hours reading all of the existing Q&A and they are all lacking in the explanation of the issue, alternative tools that can be used to solve it, and/or the pros/cons/caveats of the possible solutions. Also some of them have accepted answers that are just plain dangerous and should never be used.

现在回到典型问题,这将导致此处的引荐:

Now back to the typical question that would result in a referral here:

我有一个包含1行的文件:

I have a file containing 1 line:

what isgoingon

,当我使用此awk脚本将其打印以反转字段顺序时:

and when I print it using this awk script to reverse the order of the fields:

awk '{print $2, $1}' file

而不是看到我期望的输出:

instead of seeing the output I expect:

isgoingon what

我得到应该在行尾的字段出现在行首,并覆盖了行首的一些文本:

I get the field that should be at the end of the line appear at the start of the line, overwriting some text at the start of the line:

 whatngon

或者我将输出分为两行:

or I get the output split onto 2 lines:

isgoingon
 what

问题可能是什么,我该如何解决?

What could the problem be and how do I fix it?

推荐答案

问题是您的输入文件使用CRLF的DOS行尾而不是仅LF的UNIX行尾,并且您正在运行UNIX工具这样,CR仍然是UNIX工具正在操作的数据的一部分. CR通常用\r表示,并且当您在文件上运行cat -vELF\n并显示为$^M). >.

The problem is that your input file uses DOS line endings of CRLF instead of UNIX line endings of just LF and you are running a UNIX tool on it so the CR remains part of the data being operated on by the UNIX tool. CR is commonly denoted by \r and can be seen as a control-M (^M) when you run cat -vE on the file while LF is \n and appears as $ with cat -vE.

所以您的输入文件并不只是:

So your input file wasn't really just:

what isgoingon

实际上是:

what isgoingon\r\n

如您在cat -v中看到的那样:

$ cat -vE file
what isgoingon^M$

od -c:

$ od -c file
0000000   w   h   a   t       i   s   g   o   i   n   g   o   n  \r  \n
0000020

因此,当您在文件上运行诸如awk之类的UNIX工具(将\n视为行尾)时,\n被读取行所占用,但是这2个字段为:

so when you run a UNIX tool like awk (which treats \n as the line ending) on the file, the \n is consumed by the act of reading the line, but that leaves the 2 fields as:

<what> <isgoingon\r>

请注意第二个字段末尾的\r. \r表示Carriage Return,它实际上是将光标返回到行首的指令,因此在执行此操作时:

Note the \r at the end of the second field. \r means Carriage Return which is literally an instruction to return the cursor to the start of the line so when you do:

print $2, $1

awk将打印isgoingon,然后将光标返回到行的开头,然后再打印what,这就是what似乎会覆盖isgoingon开头的原因.

awk will print isgoingon and then will return the cursor to the start of the line before printing what which is why the what appears to overwrite the start of isgoingon.

要解决此问题,请执行以下任一操作:

To fix the problem, do either of these:

dos2unix file
sed 's/\r$//' file
awk '{sub(/\r$/,"")}1' file
perl -pe 's/\r$//' file

在某些UNIX变体(例如Ubuntu)中,显然dos2unix也称为frodos.

Apparently dos2unix is aka frodos in some UNIX variants (e.g. Ubuntu).

如果您决定使用通常建议的tr -d '\r',请小心,因为这将删除文件中的所有 \r,而不仅仅是行尾的所有.

Be careful if you decide to use tr -d '\r' as is often suggested as that will delete all \rs in your file, not just those at the end of each line.

请注意,GNU awk允许您通过简单地设置RS来解析具有DOS行尾的文件:

Note that GNU awk will let you parse files that have DOS line endings by simply setting RS appropriately:

gawk -v RS='\r\n' '...' file

,但其他awk则不允许这样做,因为POSIX仅需要awk来支持单个字符RS,而大多数其他awk会悄悄地将RS='\r\n'截断为RS='\r'.您可能需要为gawk添加-v BINMODE=3,甚至可以看到\r,尽管底层的C原语会在某些平台上剥离它们,例如cygwin.

but other awks will not allow that as POSIX only requires awks to support a single character RS and most other awks will quietly truncate RS='\r\n' to RS='\r'. You may need to add -v BINMODE=3 for gawk to even see the \rs though as the underlying C primitives will strip them on some platforms, e.g. cygwin.

需要注意的一件事是,由Windows工具(如Excel)创建的CSV将使用CRLF作为行尾,但可以将LF嵌入到CSV的特定字段中,例如:

One thing to watch out for is that CSVs created by Windows tools like Excel will use CRLF as the line endings but can have LFs embedded inside a specific field of the CSV, e.g.:

"field1","field2.1
field2.2","field3"

是真的:

"field1","field2.1\nfield2.2","field3"\r\n

因此,如果您仅将\r\n s转换为\n s,则无法再将换行符中的字段内的换行符作为行尾来告知,因此,如果您要这样做,我建议将所有字段内换行符转换为某种形式首先,例如这样会将所有字段内LFs转换为制表符,并将所有以CRLF s结尾的行转换为LF s:

so if you just convert \r\ns to \ns then you can no longer tell linefeeds within fields from linefeeds as line endings so if you want to do that I recommend converting all of the intra-field linefeeds to something else first, e.g. this would convert all intra-field LFs to tabs and convert all line ending CRLFs to LFs:

gawk -v RS='\r\n' '{gsub(/\n/,"\t")}1' file

做类似的事情,没有留下GNU awk作为练习,但是对于其他awks,它涉及合并在读取时不以CR结尾的行.

Doing similar without GNU awk left as an exercise but with other awks it involves combining lines that do not end in CR as they're read.

这篇关于为什么我的工具输出会覆盖自身,如何解决?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆