将正则表达式转换为sed或grep正则表达式 [英] converting regex to sed or grep regex

查看:70
本文介绍了将正则表达式转换为sed或grep正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不确定为什么这行不通.这是正则表达式'text \'=>'.*?',我想使用grep或sed在以下讨厌的文本中捕获 estrenos cine .这是我在grep中尝试过的

I am not sure why this doesn't work. Here is the regex 'text\' => '.*?' and I want to catch estrenos and cine in the following nasty text using grep or sed. Here is what I tried in grep

echo "sadsa d{                             'text' => 'cine',                             'indices' => [                                            111,                                            116                                          ]                           },                           {                             'text' => 'estrenos',                             'indices' => [ sSADW" | grep -Eo "'text\' => '.*?',"

推荐答案

只需使用awk:

$ awk -v RS='}' -F\' '{print $4}' file
cine
estrenos

这将与任何UNIX盒子上的任何shell中的任何awk一起使用.不管空白是什么,它也将起作用,因此无论您的输入是一行还是跨多行,而且每行中的任何地方出现空白或制表符,它都将起作用.

That will work with any awk in any shell on any UNIX box. It will also work no matter what the white space is so it'll work whether your input is on one line or spread across multiple lines and no matter how many blanks or tabs occur anywhere on each line.

这是它的工作方式:

awk将所有输入视为记录分为多个字段.您的输入(为了便于阅读而压缩了空格):

awk treats all input as records separated into fields. Your input (with spaces compressed for readability):

sadsa d{ 'text' => 'cine', 'indices' => [ 111, 116 ] }, { 'text' => 'estrenos', 'indices' => [ sSADW

显然有 {...} 条记录:

记录1:

{ 'text' => 'cine', 'indices' => [ 111, 116 ] }

记录2:

{ 'text' => 'estrenos', 'indices' => [ sSADW

因此我们可以将Record Separator设置为} (使用 -v RS ='}').我认为您的最后一条记录也确实会以} 结尾,但是如果不行,因为awk会将文件结尾视为记录结尾,就可以了.我们可以忽略 { s之前的文本(即第一条记录之前的"sadsa d"和两条记录之间的,"),这实际上被视为第一个字段的一部分,但是我们没有使用该字段对任何内容都无关紧要.

so we can set the Record Separator to } (with -v RS='}'). I assume your last record will really end in a } too but if it doesn't that's fine as awk treats end of file like the end of a record. We can ignore the text before the {s (i.e. "sadsa d" before the first record and "," between the 2 records - that's really treated as part of the first field but we're not using that field for anything so it's irrelevant.

因此,给定以上2条记录,如果我们将它们分成每个'(带有 -F \')的字段,则得到:

So given the above 2 records if we split them into fields at every ' (with -F\') then we get:

$ awk -v RS='}' -F\' '{for (i=1; i<=NF;i++) print "Record Nr", NR, "Field Nr", i, "Field Contents: <" $i ">"; print "----"
}' file
Record Nr 1 Field Nr 1 Field Contents: <sadsa d{ >
Record Nr 1 Field Nr 2 Field Contents: <text>
Record Nr 1 Field Nr 3 Field Contents: < => >
Record Nr 1 Field Nr 4 Field Contents: <cine>
Record Nr 1 Field Nr 5 Field Contents: <, >
Record Nr 1 Field Nr 6 Field Contents: <indices>
Record Nr 1 Field Nr 7 Field Contents: < => [ 111, 116 ] >
----
Record Nr 2 Field Nr 1 Field Contents: <, { >
Record Nr 2 Field Nr 2 Field Contents: <text>
Record Nr 2 Field Nr 3 Field Contents: < => >
Record Nr 2 Field Nr 4 Field Contents: <estrenos>
Record Nr 2 Field Nr 5 Field Contents: <, >
Record Nr 2 Field Nr 6 Field Contents: <indices>
Record Nr 2 Field Nr 7 Field Contents: < => [ sSADW
>
----

因此,如您所见,所需的值始终只是每个记录的第4个字段.

so as you can see the value you want is always simply the 4th field of each record.

这篇关于将正则表达式转换为sed或grep正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆