如何使用后缀匹配单引号或双引号? [英] How can I use a look after to match either a single or a double quote?
问题描述
我要提取一系列字符串:
I have a series of strings I want to extract:
hello.this_is("bla bla bla")
some random text
hello.this_is('hello hello')
other stuff
我需要获取的内容(从许多文件中获取,但这在这里并不重要)是hello.this_is(
和)
之间的内容,所以我想要的输出是:
What I need to get (from many files, but this is not important here) is the content between hello.this_is(
and )
, so my desired output is:
bla bla bla
hello hello
如您所见,括号内的文本可以用双引号或单引号引起来.
As you see, the text within parentheses can be enclosed with either double or single quotes.
如果这只是单引号,我会像这样:
If this was only single quotes I would use a look behind and look ahead just like this:
grep -Po "(?<=hello.this_is\(').*(?=')" file
# ^ ^
# returns ---> hello hello
类似地,要从双引号中获取字符串,我会说:
Similarly, to get strings from double quotes I would say:
grep -Po '(?<=hello.this_is\(").*(?=")' file
# ^ ^
# returns ---> bla bla bla
但是,我想同时匹配这两种情况,因此它会得到单引号和双引号.我尝试使用$''
进行转义,但无法使其起作用:
However, I want to match both cases, so it gets both single and double quotes. I tried with using $''
to escape, but could not make it work:
grep -Po '(?<=hello.this_is\($'["\']').*(?=$'["\']')' file
# ^^^^^^^^ ^^^^^^^^
我当然可以使用ASCII码说:
I can of course use the ASCII number and say:
grep -Po '(?<=hello.this_is\([\047\042]).*' file
但是我想使用引号和单引号,因为047
和042
对我来说不像单引号和双引号那样具有代表性.
but I would like to use the quotes and single quotes, since 047
and 042
are not that much representative to me as single and double quotes are.
推荐答案
注意:此答案底部的sed
命令仅在您的字符串表现得很好时才有效
Note: The sed
command at the bottom of this answer works only as long as your strings are nice behaving strings like
"foo"
或
'bar'
一旦您的字符串开始出现异常:)就像:
As soon as your strings start to misbehave :) like:
"hello \"world\""
它将不再起作用.
您的输入看起来像源代码.为了获得稳定的解决方案,我建议使用该语言的解析器来提取字符串.
Your input looks like source code. For a stable solution I recommend to use a parser for that language to extract the strings.
对于普通用例:
您可以使用sed
.与grep -oP
仅适用于GNU grep的grep -oP
相比,该解决方案应该可以在任何POSIX平台上运行:
You can use sed
. The solution is supposed to work on any POSIX platform in contrast to grep -oP
which only works with GNU grep:
sed -n 's/hello\.this_is(\(["'\'']\)\([^"]*\)\(["'\'']\).*/\2/gp' file
# ^^^^^^^^ ^^
# capture group 2 ^
这篇关于如何使用后缀匹配单引号或双引号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!