Grep正则表达式无法按预期工作 [英] Grep regular expression not working as expected
问题描述
我有一个简单的grep
命令,试图仅获取CSV文件的第一列(包括逗号).就像这样...
I have a simple grep
command trying to get only the first column of a CSV file including the comma. It goes like this...
grep -Eo '^[^,]+,' some.csv
所以在我的脑海中,它的意思是"只给我匹配行的一部分,其中每行以至少一个不是逗号的字符开头,然后是一个逗号".
So in my head, that reads like "get me only the matching part of the line where each line starts with at least one character that is not a comma, followed by a single comma."
在文件some.csv
上,看起来像这样:
So on a file, some.csv
, that looks like this:
column1,column2,column3,column4
column1,column2,column3,column4
column1,column2,column3,column4
我希望得到这样的输出:
I'm expecting this output:
column1,
column1,
column1,
但我得到以下输出:
column1,
column2,
column3,
column1,
column2,
column3,
column1,
column2,
column3,
那是为什么? 我的grep/regex中缺少什么?我的预期输出不正确吗?
Why is that? What am I missing from my grep/regex? Is my expected output incorrect?
如果我删除了正则表达式中尾部逗号的要求,该命令将按我期望的那样工作.
If I remove the requirement of the trailing comma in the regex, the command works as I expect.
grep -Eo '^[^,]+' some.csv
给我:
column1
column1
column1
注意:我在使用grep版本的macOS High Sierra:grep (BSD grep) 2.5.1-FreeBSD
NOTE: I'm on macOS High Sierra with grep version: grep (BSD grep) 2.5.1-FreeBSD
推荐答案
BSD grep
通常是儿童车.请参阅以下相关文章:
BSD grep
is buggy in general. See the following related posts:
- Why does this BSD grep result differ from GNU grep?
- grep strange behaviour with single letter words
- How to make BSD grep respect start-of-line anchor
上面的最后一个链接提到了您的情况:使用-o
选项时,grep
出于某种原因会忽略^
锚点. FreeBSD错误:
That last link above mentions your case: when -o
option is used, grep
ignores the ^
anchor for some reason. This issue is also described in a FreeBSD bug:
我注意到相同版本的grep还有更多问题.我不 知道它们是否相关,但是我现在将它们附加在这里.
I've noticed some more issues with the same version of grep. I don't know whether they're related, but I'll append them here for now.
$ printf abc | grep -o '^[a-c]'
应该只打印'a',而是针对每个字母给出三次匹配 输入的文本.
should just print 'a', but instead gives three hits, against each letter of the incoming text.
作为一种解决方法,最好只安装可以正常工作.
As a workaround, it might be a better idea to just install GNU grep that works as expected.
或者,将sed
与BRE POSIX模式一起使用:
Or, use sed
with a BRE POSIX pattern:
sed -i '' 's/^\([^,]*,\).*/\1/' file
模式匹配的地方
-
^
-一行的开头 -
\([^,]*,\)
-第1组(后来从RHS引用为\1
反向引用):-
[^,]*
-除,
之外的零个或多个字符
-
,
-一个,
字符
^
- start of a line\([^,]*,\)
- Group 1 (later referred to with\1
backreference from the RHS):[^,]*
- zero or more chars other than,
,
- a,
char
请注意,
-i
将就地更改文件内容.如果需要,请使用-i.bak
创建备份文件(然后,虽然您不需要下一个空的''
).Note that
-i
will change the file contents inplace. Use-i.bak
to create a backup file if needed (then, you wouldn't need the next empty''
though).这篇关于Grep正则表达式无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-