如何使用 grep、regex 或 perl 按照模式提取字符串 [英] How to extract string following a pattern with grep, regex or perl

查看:33
本文介绍了如何使用 grep、regex 或 perl 按照模式提取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的文件:

I have a file that looks something like this:

    <table name="content_analyzer" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer2" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer_items" primary-key="id">
      <type="global" />
    </table>

我需要提取name=后面的引号内的任何内容,即content_analyzercontent_analyzer2content_analyzer_items.

I need to extract anything within the quotes that follow name=, i.e., content_analyzer, content_analyzer2 and content_analyzer_items.

我在 Linux 机器上执行此操作,因此使用 sed、perl、grep 或 bash 的解决方案很好.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.

推荐答案

由于您需要匹配内容而不将其包含在结果中(必须匹配 name=" 但它不是所需结果的一部分)某种形式的需要零宽度匹配或组捕获.这是可以做到的使用以下工具轻松实现:

Since you need to match content without including it in the result (must match name=" but it's not part of the desired result) some form of zero-width matching or group capturing is required. This can be done easily with the following tools:

使用 Perl,您可以使用 n 选项逐行循环并打印捕获组的内容(如果匹配):

With Perl you could use the n option to loop line by line and print the content of a capturing group if it matches:

perl -ne 'print "$1
" if /name="(.*?)"/' filename

GNU grep

如果你有改进版的 grep,比如 GNU grep,你可能有-P 选项可用.此选项将启用类似 Perl 的正则表达式,允许您使用 K 这是一个速记后视.它会重置匹配位置,所以它之前的任何东西都是零宽度.

GNU grep

If you have an improved version of grep, such as GNU grep, you may have the -P option available. This option will enable Perl-like regex, allowing you to use K which is a shorthand lookbehind. It will reset the match position, so anything before it is zero-width.

grep -Po 'name="K.*?(?=")' filename

o 选项使 grep 只打印匹配的文本,而不是整条线.

The o option makes grep print only the matched text, instead of the whole line.

另一种方法是直接使用文本编辑器.使用 Vim,其中之一实现此目的的各种方法是删除行而不name= 然后从结果行中提取内容:

Another way is to use a text editor directly. With Vim, one of the various ways of accomplishing this would be to delete lines without name= and then extract the content from the resulting lines:

:v/.*name="v([^"]+).*/d|%s//1

<小时>

标准 grep

如果您无法访问这些工具,出于某种原因,使用标准 grep 可以实现类似的功能.然而,不看稍后需要对其周围进行一些清理:


Standard grep

If you don't have access to these tools, for some reason, something similar could be achieved with standard grep. However, without the look around it will require some cleanup later:

grep -o 'name="[^"]*"' filename

<小时>

关于保存结果的说明

在上述所有命令中,结果将发送到 stdout.它是重要的是要记住,您始终可以通过管道将它们保存到通过附加文件:


A note about saving results

In all of the commands above the results will be sent to stdout. It's important to remember that you can always save them by piping it to a file by appending:

> result

到命令的结尾.

这篇关于如何使用 grep、regex 或 perl 按照模式提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆