grep的接入多条线路,发现两种模式之间的所有单词 [英] Grep Access Multiple lines, find all words between two patterns

查看:108
本文介绍了grep的接入多条线路,发现两种模式之间的所有单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要扫描的文本文件帮助和找到两个模式之间的所有单词。好比说,如果我们有一个.sql文件,需要扫描并找到'和'在哪里'之间的所有单词。的grep仅可以一次扫描1线。对于这个要求,什么是最好的UNIX脚本使用? SED,AWK具有这些功能?指着任何例子是大大AP preciated。


解决方案

战略经济对话这样的:

  SED -n -e'/从/,/在哪里/ P'file.sql

打印与 A线之间距离用行中的所有线条和其中,

有关的东西,可以包括具有无论从和线条其中:

 #!/ bin中/ SED -nf/来自哪里/ {
    小号/.* \\(从*。其中\\)。* / \\ 1 / P
    ð
}
/从/ {
    : 下一个
    ñ
    /在哪里/ {
        S / ^ [^ \\ n] * \\(从*。其中\\)[^ \\ n] * / \\ 1 / P
        ð
    }
    $! b下一
}

这(写sed脚本)稍微复杂一些,我会尽力解释的细节。

第一行是在同时包含 A线和其中,执行。如果一条线,模式匹配,两个命令执行。我们使用取值替换命令只提取来自哪里的部件(包括来自哪里)。该命令打印 P 后缀行。 delete命令清除模式空间(工作缓存),装载下一行并重新启动脚本。

第二个命令开始执行一系列命令(由大括号括起),当从包含 A线被找到。基本上,命令形成回路,将继续追加行从输入到模式空间,直到有一个其中,或发现,直到我们达到最后一行。

命令创建一个标签,以使我们能够跳回来时,我们要脚本标记。在 N 命令读取来自输入线,并将其追加到模式空间(分离以换行符行)。

其中发现,我们可以打印出模式空间的内容,但首先我们有替代命令清除它。它类似于一个使用previously,但我们现在更换前端和后端。* [^ \\ n] * ,它告诉sed只匹配非换行符,有效地在第一线,并在最后一行,其中匹配。在 D 命令然后清除模式空间并重新启动下一行脚本。

B 命令会跳转到一个标签,在我们的例子中,标签接下来。但是, $!地址说,它不应该在最后一行被执行,使我们能够退出循环。当离开环这样,我们还没有找到一个相应的其中,,所以你可能不希望打印出来。

请注意然而,这具有一些缺点。预期在下列情况下不会被处理:

 从... ...在这里从从...从
哪里从
其中...哪里从

哪里
哪里

处理这些案件需要更多的code。

希望这有助于=)

Need help in scanning text files and find all the words between two patterns. Like say if we have a .sql file, Need to scan and find all words between from' and 'where'. Grep can only scan 1 line at a time. For this requirement what is the best unix script to use? sed, awk has these features? Pointing to any examples is greatly appreciated.

解决方案

Sed has this:

sed -n -e '/from/,/where/ p' file.sql

Prints all the lines between a line with a from and a line with a where.

For something that can include lines that have both from and where:

#!/bin/sed -nf

/from.*where/ {
    s/.*\(from.*where\).*/\1/p
    d
}
/from/ {
    : next
    N
    /where/ {
        s/^[^\n]*\(from.*where\)[^\n]*/\1/p
        d
    }
    $! b next
}

This (written as a sed script) is slightly more complex, and I'll try to explain the details.

The first line is executed on a line that contains both a from and a where. If a line matches that pattern, two commands are executed. We use the s substitute command to extract only the parts between from and where (including the from and where). The p suffix in that command prints the line. The delete command clears the pattern space (the working buffer), loading the next line and restarting the script.

The second command starts to execute a series of commands (grouped by the braces) when a line containing from is found. Basically, the commands form a loop that will keep appending lines from the input into the pattern space until a line with a where is found or until we reach the last line.

The : "command" creates a label, a marker in the script that allows us to "jump" back when we want to. The N command reads a line from the input, and appends it to the pattern space (separating the lines with a newline character).

When a where is found, we can print out the contents of the pattern space, but first we have to clean it with the substitute command. It is analogous to the one used previously, but we now replace the leading and trailing .* with [^\n]*, which tells sed to match only non-newline characters, effectively matching a from in the first line and a where in the last line. The d command then clears the pattern space and restarts the script on the next line.

The b command will jump to a label, in our case, the label next. However, the $! address says it should not be executed on the last line, allowing us to leave the loop. When leaving the loop this way, we haven't found a respective where, so you may not want to print it.

Note however, this has some drawbacks. The following cases won't be handled as expected:

from ... where ... from

from ... from
where

from
where ... where

from
from
where
where

Handling these cases require more code.

Hope this helps =)

这篇关于grep的接入多条线路,发现两种模式之间的所有单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆