使用 sed 删除文件的行 - 意外行为 [英] Deleting lines of a file with sed - unexpected behaviour

查看:60
本文介绍了使用 sed 删除文件的行 - 意外行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在玩 sed 时发现了一些奇怪的东西.如果您尝试从文件中删除多个行间隔(按编号),但列表中后面指定的任何间隔都完全包含在列表中较早的一个间隔内,则在指定的(较大)间隔之后删除一个额外的单行.

seq 10 >foo.txtsed '2,7d;3,6d' foo.txt1910

这种行为对我来说是一个烦人的错误背后的原因,因为在我的脚本中我即时生成了间隔端点,并且在某些情况下产生的间隔是多余的.我可以解决这个问题,但我想不出一个很好的理由来说明为什么 sed 会故意这样做.

解决方案

由于这个问题在 2015-02-24 的 Stack Overflow Weekly Newsletter 电子邮件中被突出显示为需要答案,我正在转换上面的评论(提供答案)变成了正式的答案.此处未注明出处的评论是我以基本等效的形式发表的.

感谢您提出简洁、完整的问题.结果很有趣.我可以用你的脚本重现它.有趣的是,sed '3,6d;2,7d' foo.txt(删除操作以相反的顺序)产生了预期的答案,输出中包含 8.这使它看起来像是 (GNU) sed 中的一个可报告的错误,尤其是当 BSD sed(在 Mac OS X 10.10.2 Yosemite 上)与操作正常工作时以任一顺序.我使用来自 Ubuntu 14.04 衍生版本的 'sed (GNU sed) 4.2.2' 进行了测试.

为您/他们提供更多数据点.这两个都在输出中包含 8:

sed -e '/2/,/7/d' -e '/3/,/6/d' foo.txtsed -e '2,7d' -e '/3/,/6/d' foo.txt

相比之下,这不会:

sed -e '/2/,/7/d' -e '3,6d' foo.txt

后者让我感到惊讶(甚至接受了基本错误).

<块引用>

打败我.我认为考虑到 sed 的一些神秘构造,您可能会遗漏蝙蝠侠符号或命令中间的某些内容,但是 sed -e '2,7d' -e '3,6d' foo.txt 的行为方式相同,交换顺序会产生预期的结果(Cygwin 上的 GNU sed 4.2.2)./bin/sed 在 Solaris 上总是产生预期的结果,有趣的是 GNU sed 3.02 也是如此.埃德·莫顿

<块引用>

更多数据:如果第二个范围是第一个范围的子集,它似乎只发生在 sed 4.2.2 中:sed '2,5d;2,5d' 显示错误,sed '2,5d;1,5d'sed '2,5d;2,6d' 没有.格伦杰克曼

GNU sed 主页说请发送错误报告给 gnu.org 上的错误 sed"(除了它有一个 @ 代替 ' at ').你有一个很好的复制品;明确说明您期望的输出与您得到的输出(他们会明白这一点,但最好确保他们不会误解).指出命令的相反顺序按预期工作,并给出各种其他命令作为工作或不工作的示例.(您甚至可以将此问答 URL 作为交叉引用,但请确保错误报告是自包含的,以便即使没有人跟随该 URL 也能理解.)

您还可以将 BSD sed(以及 Solaris 版本和较旧的 GNU 3.02 sed)指向为按预期运行.随着旧版本的 GNU sed 工作,这意味着这可以说是一种回归.[…经过一些实验…] 4.1 版本中发生了破坏;4.0.9 版本是可以的.(我还检查了 4.1.5 和 4.2.1;两者都坏了.)如果维护人员想通过查看更改的内容来查找问题,这将有助于他们.

OP 指出:

<块引用>
  • 感谢大家的评论和额外的测试.我将向 GNU sed 提交错误报告并发布他们的回复.santayana

I noticed something a bit odd while fooling around with sed. If you try to remove multiple line intervals (by number) from a file, but any interval specified later in the list is fully contained within an interval earlier in the list, then an additional single line is removed after the specified (larger) interval.

seq 10 > foo.txt

sed '2,7d;3,6d' foo.txt
1
9
10

This behaviour was behind an annoying bug for me, since in my script I generated the interval endpoints on the fly, and in some cases the intervals produced were redundant. I can clean this up, but I can't think of a good reason why sed would behave this way on purpose.

解决方案

Since this question was highlighted as needing an answer in the Stack Overflow Weekly Newsletter email for 2015-02-24, I'm converting the comments above (which provide the answer) into a formal answer. Unattributed comments here were made by me in essentially equivalent form.

Thank you for a concise, complete question. The result is interesting. I can reproduce it with your script. Intriguingly, sed '3,6d;2,7d' foo.txt (with the delete operations in the reverse order) produces the expected answer with 8 included in the output. That makes it look like it might be a reportable bug in (GNU) sed, especially as BSD sed (on Mac OS X 10.10.2 Yosemite) works correctly with the operations in either order. I tested using 'sed (GNU sed) 4.2.2' from an Ubuntu 14.04 derivative.

More data points for you/them. Both of these include 8 in the output:

sed -e '/2/,/7/d' -e '/3/,/6/d' foo.txt
sed -e '2,7d' -e '/3/,/6/d' foo.txt

By contrast, this does not:

sed -e '/2/,/7/d' -e '3,6d' foo.txt

The latter surprised me (even accepting the basic bug).

Beats me. I thought given some of sed's arcane constructs that you might be missing the batman symbol or something from the middle of your command but sed -e '2,7d' -e '3,6d' foo.txt behaves the same way and swapping the order produces the expected results (GNU sed 4.2.2 on Cygwin). /bin/sed on Solaris always produces the expected result and interestingly so does GNU sed 3.02. Ed Morton

More data: it only seems to happen with sed 4.2.2 if the 2nd range is a subset of the first: sed '2,5d;2,5d' shows the bug, sed '2,5d;1,5d' and sed '2,5d;2,6d' do not. glenn jackman

The GNU sed home page says "Please send bug reports to bug-sed at gnu.org" (except it has an @ in place of ' at '). You've got a good reproduction; be explicit about the output you expect vs the output you get (they'll get the point, but it's best to make sure they can't misunderstand). Point out that the reverse ordering of the commands works as expected, and give the various other commands as examples of working or not working. (You could even give this Q&A URL as a cross-reference, but make sure that the bug report is self-contained so that it can be understood even if no-one follows the URL.)

You can also point to BSD sed (and the Solaris version, and the older GNU 3.02 sed) as behaving as expected. With the old version GNU sed working, it means this is arguably a regression. […After a little experimentation…] The breakage occurred in the 4.1 release; the 4.0.9 release is OK. (I also checked 4.1.5 and 4.2.1; both are broken.) That will help the maintainers if they want to find the trouble by looking at what changed.

The OP noted:

  • Thanks everyone for comments and additional tests. I'll submit a bug report to GNU sed and post their response. santayana

这篇关于使用 sed 删除文件的行 - 意外行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆