如何使用grep,regex或perl按照模式提取字符串 [英] How to extract string following a pattern with grep, regex or perl
问题描述
我有一个看起来像这样的文件:
I have a file that looks something like this:
<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
<type="global" />
</table>
我需要提取在name=
后面的引号内的所有内容,即content_analyzer
,content_analyzer2
和content_analyzer_items
.
I need to extract anything within the quotes that follow name=
, i.e., content_analyzer
, content_analyzer2
and content_analyzer_items
.
我正在Linux机器上执行此操作,因此使用sed,perl,grep或bash的解决方案就可以了.
I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.
推荐答案
因为您需要匹配内容而不将其包含在结果中(必须
匹配name="
,但这不是期望结果的一部分)某种形式的
零宽度匹配或组捕获是必需的.可以做到的
使用以下工具可以轻松实现:
Since you need to match content without including it in the result (must
match name="
but it's not part of the desired result) some form of
zero-width matching or group capturing is required. This can be done
easily with the following tools:
使用Perl,您可以使用n
选项逐行循环并打印
匹配的捕获组的内容:
With Perl you could use the n
option to loop line by line and print
the content of a capturing group if it matches:
perl -ne 'print "$1\n" if /name="(.*?)"/' filename
GNU grep
如果您具有改进的grep版本(例如GNU grep),则可能有
-P
选项可用.此选项将启用类似Perl的正则表达式,
允许您使用\K
,这是后面的简写.它将重置
匹配位置,因此它之前的所有内容都是零宽度.
GNU grep
If you have an improved version of grep, such as GNU grep, you may have
the -P
option available. This option will enable Perl-like regex,
allowing you to use \K
which is a shorthand lookbehind. It will reset
the match position, so anything before it is zero-width.
grep -Po 'name="\K.*?(?=")' filename
o
选项使grep仅打印匹配的文本,而不是
整行.
The o
option makes grep print only the matched text, instead of the
whole line.
另一种方法是直接使用文本编辑器.与Vim一起,
完成此操作的各种方法是删除行而不
name=
,然后从结果行中提取内容:
Another way is to use a text editor directly. With Vim, one of the
various ways of accomplishing this would be to delete lines without
name=
and then extract the content from the resulting lines:
:v/.*name="\v([^"]+).*/d|%s//\1
标准grep
如果由于某些原因您无权使用这些工具, 使用标准grep可以实现类似的效果.但是,没有外观 周围将需要稍后的清理:
Standard grep
If you don't have access to these tools, for some reason, something similar could be achieved with standard grep. However, without the look around it will require some cleanup later:
grep -o 'name="[^"]*"' filename
关于保存结果的说明
在以上所有命令中,结果将发送到stdout
.它是
重要的是要记住,您始终可以通过将其通过管道传输到
通过附加文件:
A note about saving results
In all of the commands above the results will be sent to stdout
. It's
important to remember that you can always save them by piping it to a
file by appending:
> result
到命令末尾.
这篇关于如何使用grep,regex或perl按照模式提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!