Grep正则表达式无法按预期工作 [英] Grep regular expression not working as expected

查看:81
本文介绍了Grep正则表达式无法按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的grep命令,试图仅获取CSV文件的第一列(包括逗号).就像这样...

I have a simple grep command trying to get only the first column of a CSV file including the comma. It goes like this...

grep -Eo '^[^,]+,' some.csv

所以在我的脑海中,它的意思是"只给我匹配行的一部分,其中每行以至少一个不是逗号的字符开头,然后是一个逗号".

So in my head, that reads like "get me only the matching part of the line where each line starts with at least one character that is not a comma, followed by a single comma."

在文件some.csv上,看起来像这样:

So on a file, some.csv, that looks like this:

column1,column2,column3,column4
column1,column2,column3,column4
column1,column2,column3,column4

我希望得到这样的输出:

I'm expecting this output:

column1,
column1,
column1,

但我得到以下输出:

column1,
column2,
column3,
column1,
column2,
column3,
column1,
column2,
column3,

那是为什么? 我的grep/regex中缺少什么?我的预期输出不正确吗?

Why is that? What am I missing from my grep/regex? Is my expected output incorrect?

如果我删除了正则表达式中尾部逗号的要求,该命令将按我期望的那样工作.

If I remove the requirement of the trailing comma in the regex, the command works as I expect.

grep -Eo '^[^,]+' some.csv

给我:

column1
column1
column1

注意:我在使用grep版本的macOS High Sierra:grep (BSD grep) 2.5.1-FreeBSD

NOTE: I'm on macOS High Sierra with grep version: grep (BSD grep) 2.5.1-FreeBSD

推荐答案

BSD grep通常是儿童车.请参阅以下相关文章:

BSD grep is buggy in general. See the following related posts:

  • Why does this BSD grep result differ from GNU grep?
  • grep strange behaviour with single letter words
  • How to make BSD grep respect start-of-line anchor

上面的最后一个链接提到了您的情况:使用-o选项时,grep出于某种原因会忽略^锚点. FreeBSD错误:

That last link above mentions your case: when -o option is used, grep ignores the ^ anchor for some reason. This issue is also described in a FreeBSD bug:

我注意到相同版本的grep还有更多问题.我不 知道它们是否相关,但是我现在将它们附加在这里.

I've noticed some more issues with the same version of grep. I don't know whether they're related, but I'll append them here for now.

$ printf abc | grep -o '^[a-c]'

应该只打印'a',而是针对每个字母给出三次匹配 输入的文本.

should just print 'a', but instead gives three hits, against each letter of the incoming text.

作为一种解决方法,最好只安装可以正常工作.

As a workaround, it might be a better idea to just install GNU grep that works as expected.

或者,将sed与BRE POSIX模式一起使用:

Or, use sed with a BRE POSIX pattern:

sed -i '' 's/^\([^,]*,\).*/\1/' file

模式匹配的地方

  • ^-一行的开头
  • \([^,]*,\)-第1组(后来从RHS引用为\1反向引用):
    • [^,]*-除,
    • 之外的零个或多个字符
    • ,-一个,字符
    • ^ - start of a line
    • \([^,]*,\) - Group 1 (later referred to with \1 backreference from the RHS):
      • [^,]* - zero or more chars other than ,
      • , - a , char

      请注意,-i将就地更改文件内容.如果需要,请使用-i.bak创建备份文件(然后,虽然您不需要下一个空的'').

      Note that -i will change the file contents inplace. Use -i.bak to create a backup file if needed (then, you wouldn't need the next empty '' though).

      这篇关于Grep正则表达式无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆