Grep 正则表达式没有按预期工作 [英] Grep regular expression not working as expected
问题描述
我有一个简单的 grep
命令,试图只获取 CSV 文件的第一列,包括逗号.事情是这样的……
I have a simple grep
command trying to get only the first column of a CSV file including the comma. It goes like this...
grep -Eo '^[^,]+,' some.csv
所以在我的脑海中,这读起来就像只获取该行的匹配部分,其中每行至少以一个不是逗号的字符开头,后跟一个逗号."
So in my head, that reads like "get me only the matching part of the line where each line starts with at least one character that is not a comma, followed by a single comma."
在一个文件 some.csv
上,看起来像这样:
So on a file, some.csv
, that looks like this:
column1,column2,column3,column4
column1,column2,column3,column4
column1,column2,column3,column4
我期待这个输出:
column1,
column1,
column1,
但我得到这个输出:
column1,
column2,
column3,
column1,
column2,
column3,
column1,
column2,
column3,
这是为什么?我的 grep/regex 遗漏了什么?我的预期输出不正确吗?
Why is that? What am I missing from my grep/regex? Is my expected output incorrect?
如果我删除正则表达式中尾随逗号的要求,该命令将按预期工作.
If I remove the requirement of the trailing comma in the regex, the command works as I expect.
grep -Eo '^[^,]+' some.csv
给我:
column1
column1
column1
注意:我在 macOS High Sierra 上使用 grep 版本:grep (BSD grep) 2.5.1-FreeBSD
NOTE: I'm on macOS High Sierra with grep version: grep (BSD grep) 2.5.1-FreeBSD
推荐答案
BSD grep
通常有问题.请参阅以下相关帖子:
BSD grep
is buggy in general. See the following related posts:
上面的最后一个链接提到了您的情况:当使用 -o
选项时,grep
会出于某种原因忽略 ^
锚点.此问题也在 FreeBSD 错误中描述:
That last link above mentions your case: when -o
option is used, grep
ignores the ^
anchor for some reason. This issue is also described in a FreeBSD bug:
我注意到相同版本的 grep 存在更多问题.我不知道它们是否相关,但我现在将它们附加在这里.
I've noticed some more issues with the same version of grep. I don't know whether they're related, but I'll append them here for now.
<代码>$ printf abc |grep -o '^[a-c]'
应该只打印 'a',而是针对每个字母给出 3 次点击传入文本.
should just print 'a', but instead gives three hits, against each letter of the incoming text.
As a workaround, it might be a better idea to just install GNU grep that works as expected.
或者,使用带有 BRE POSIX 模式的 sed
:
Or, use sed
with a BRE POSIX pattern:
sed -i '' 's/^([^,]*,).*/1/' file
模式匹配的地方
^
- 一行的开始([^,]*,)
- 第 1 组(稍后使用来自 RHS 的1
反向引用引用):[^,]*
- 除,
之外的零个或多个字符,
- 一个,
字符
^
- start of a line([^,]*,)
- Group 1 (later referred to with1
backreference from the RHS):[^,]*
- zero or more chars other than,
,
- a,
char
请注意,
-i
将就地更改文件内容.如果需要,请使用-i.bak
创建备份文件(然后,您将不需要下一个空的''
).Note that
-i
will change the file contents inplace. Use-i.bak
to create a backup file if needed (then, you wouldn't need the next empty''
though).这篇关于Grep 正则表达式没有按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!