Grep 正则表达式没有按预期工作 [英] Grep regular expression not working as expected

查看:15
本文介绍了Grep 正则表达式没有按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的 grep 命令,试图只获取 CSV 文件的第一列,包括逗号.事情是这样的……

I have a simple grep command trying to get only the first column of a CSV file including the comma. It goes like this...

grep -Eo '^[^,]+,' some.csv

所以在我的脑海中,这读起来就像只获取该行的匹配部分,其中每行至少以一个不是逗号的字符开头,后跟一个逗号."

So in my head, that reads like "get me only the matching part of the line where each line starts with at least one character that is not a comma, followed by a single comma."

在一个文件 some.csv 上,看起来像这样:

So on a file, some.csv, that looks like this:

column1,column2,column3,column4
column1,column2,column3,column4
column1,column2,column3,column4

我期待这个输出:

column1,
column1,
column1,

但我得到这个输出:

column1,
column2,
column3,
column1,
column2,
column3,
column1,
column2,
column3,

这是为什么?我的 grep/regex 遗漏了什么?我的预期输出不正确吗?

Why is that? What am I missing from my grep/regex? Is my expected output incorrect?

如果我删除正则表达式中尾随逗号的要求,该命令将按预期工作.

If I remove the requirement of the trailing comma in the regex, the command works as I expect.

grep -Eo '^[^,]+' some.csv

给我:

column1
column1
column1

注意:我在 macOS High Sierra 上使用 grep 版本:grep (BSD grep) 2.5.1-FreeBSD

NOTE: I'm on macOS High Sierra with grep version: grep (BSD grep) 2.5.1-FreeBSD

推荐答案

BSD grep 通常有问题.请参阅以下相关帖子:

BSD grep is buggy in general. See the following related posts:

上面的最后一个链接提到了您的情况:当使用 -o 选项时,grep 会出于某种原因忽略 ^ 锚点.此问题也在 FreeBSD 错误中描述:

That last link above mentions your case: when -o option is used, grep ignores the ^ anchor for some reason. This issue is also described in a FreeBSD bug:

我注意到相同版本的 grep 存在更多问题.我不知道它们是否相关,但我现在将它们附加在这里.

I've noticed some more issues with the same version of grep. I don't know whether they're related, but I'll append them here for now.

<代码>$ printf abc |grep -o '^[a-c]'

应该只打印 'a',而是针对每个字母给出 3 次点击传入文本.

should just print 'a', but instead gives three hits, against each letter of the incoming text.

作为一种解决方法,安装 GNU grep 按预期工作.

As a workaround, it might be a better idea to just install GNU grep that works as expected.

或者,使用带有 BRE POSIX 模式的 sed:

Or, use sed with a BRE POSIX pattern:

sed -i '' 's/^([^,]*,).*/1/' file

模式匹配的地方

  • ^ - 一行的开始
  • ([^,]*,) - 第 1 组(稍后使用来自 RHS 的 1 反向引用引用):
    • [^,]* - 除 ,
    • 之外的零个或多个字符
    • , - 一个 , 字符
    • ^ - start of a line
    • ([^,]*,) - Group 1 (later referred to with 1 backreference from the RHS):
      • [^,]* - zero or more chars other than ,
      • , - a , char

      请注意,-i 将就地更改文件内容.如果需要,请使用 -i.bak 创建备份文件(然后,您将不需要下一个空的 '').

      Note that -i will change the file contents inplace. Use -i.bak to create a backup file if needed (then, you wouldn't need the next empty '' though).

      这篇关于Grep 正则表达式没有按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆