如何使用grep,regex或perl按照模式提取字符串 [英] How to extract string following a pattern with grep, regex or perl

查看:109
本文介绍了如何使用grep,regex或perl按照模式提取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的文件:

I have a file that looks something like this:

    <table name="content_analyzer" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer2" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer_items" primary-key="id">
      <type="global" />
    </table>

我需要提取在name=后面的引号内的所有内容,即content_analyzercontent_analyzer2content_analyzer_items.

I need to extract anything within the quotes that follow name=, i.e., content_analyzer, content_analyzer2 and content_analyzer_items.

我正在Linux机器上执行此操作,因此使用sed,perl,grep或bash的解决方案就可以了.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.

推荐答案

因为您需要匹配内容而不将其包含在结果中(必须 匹配name=",但这不是期望结果的一部分)某种形式的 零宽度匹配或组捕获是必需的.可以做到的 使用以下工具可以轻松实现:

Since you need to match content without including it in the result (must match name=" but it's not part of the desired result) some form of zero-width matching or group capturing is required. This can be done easily with the following tools:

使用Perl,您可以使用n选项逐行循环并打印 匹配的捕获组的内容:

With Perl you could use the n option to loop line by line and print the content of a capturing group if it matches:

perl -ne 'print "$1\n" if /name="(.*?)"/' filename

GNU grep

如果您具有改进的grep版本(例如GNU grep),则可能有 -P选项可用.此选项将启用类似Perl的正则表达式, 允许您使用\K,这是后面的简写.它将重置 匹配位置,因此它之前的所有内容都是零宽度.

GNU grep

If you have an improved version of grep, such as GNU grep, you may have the -P option available. This option will enable Perl-like regex, allowing you to use \K which is a shorthand lookbehind. It will reset the match position, so anything before it is zero-width.

grep -Po 'name="\K.*?(?=")' filename

o选项使grep仅打印匹配的文本,而不是 整行.

The o option makes grep print only the matched text, instead of the whole line.

另一种方法是直接使用文本编辑器.与Vim一起, 完成此操作的各种方法是删除行而不 name=,然后从结果行中提取内容:

Another way is to use a text editor directly. With Vim, one of the various ways of accomplishing this would be to delete lines without name= and then extract the content from the resulting lines:

:v/.*name="\v([^"]+).*/d|%s//\1


标准grep

如果由于某些原因您无权使用这些工具, 使用标准grep可以实现类似的效果.但是,没有外观 周围将需要稍后的清理:


Standard grep

If you don't have access to these tools, for some reason, something similar could be achieved with standard grep. However, without the look around it will require some cleanup later:

grep -o 'name="[^"]*"' filename


关于保存结果的说明

在以上所有命令中,结果将发送到stdout.它是 重要的是要记住,您始终可以通过将其通过管道传输到 通过附加文件:


A note about saving results

In all of the commands above the results will be sent to stdout. It's important to remember that you can always save them by piping it to a file by appending:

> result

到命令末尾.

这篇关于如何使用grep,regex或perl按照模式提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆