如何在 Unix 中提取两种模式之间的内容 [英] How to extract content between two patterns in Unix

查看:27
本文介绍了如何在 Unix 中提取两种模式之间的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下代码的文件 test.txt.

I have a file test.txt containing below code.

select * from emp where empid=1;  

select *   
from dep  
where jkdsfj  

select *   
from sal   
where jkdsfj  

我需要提取from"和where"之间的内容.

I need to extract the content between "from" and "where".

注意:如果where"在一个新行上,它仍然必须选择from"和where"之间的材料.

Note: If "where" is on a new line it must still select the material between "from" and "where".

输出应该是这样的:

emp  
dep  
sal  

我该怎么做?

推荐答案

鉴于需要处理多行,可以选择sed,或awk,或其中之一更复杂的脚本语言,如 Perl 或 Python.

Given the need to handle multiple lines, you can choose sed, or awk, or one of the more complex scripting languages like Perl or Python.

稍加注意,sed 就足够了.我创建了一个文件 script.4(创建了 scriptscript2,并失去了我头上剩下的大部分头发**,然后用 script.1script.2script.3 重新启动,这些都是故意不完整的)像这样:

With a bit of care, sed is adequate. I created a file script.4 (having created script, script2, and losing most of what little hair was left on my head**, and restarting with script.1, script.2 and script.3, which were deliberately incomplete) like this:

/from.*where/  { s/.*from *//; s/ *where.*//;          p; n; }
/from/,/where/ { s/.*from *//; s/ *where.*//; /^ *$/d; p;    }

然后我创建了一个测试文件,data,如下所示:

And I created a test file, data, like this:

select * from emp where empid=1;  

select *   
from dep  
where jkdsfj  

select *   
from sal   
where jkdsfj  

select elephants
from abject poverty
join flying tigers
where abelone = shellfish;

select mouse
from toolset
join animals where tail = cord
and buttons = legs

并像这样运行命令,以获得显示的输出:

and ran the command like this, to get the output shown:

$ sed -n -f script.4 data
emp
dep  
sal   
abject poverty
join flying tigers
toolset
join animals
$

脚本很简单".对于同时包含 fromwhere 的行,删除 from 之前的所有内容(加上后面的任何空格),删除 中的所有内容>where 向前(加上它前面的任何空格),打印剩下的内容,然后转到下一行输入.

The script is 'simple'. For lines which contain both from and where, delete everything up to the from (plus any spaces after it), delete everything from the where onward (plus any spaces before it), print what's left, and go to the next line of input.

否则,在包含 from 的行和包含 where 的行之间,删除 from 之前的所有内容(加上它后面的任何空格),删除 where 之后的所有内容(加上它前面的任何空格),如果该行为空,则删除它;否则打印它.请注意,将 n 命令添加到第二行会使脚本行为异常(我需要花时间找出原因),但是可以将删除操作添加到第一个命令行而不会造成任何伤害(如果一行包含 from where,什么都不打印).

Otherwise, between a line which contains from and a line that contains where, delete everything up to the from (plus any spaces after it), delete everything from the where onward (plus any spaces before it), if the line is empty, delete it; otherwise print it. Note that adding an n command to the second line makes the script misbehave (I need to spend time working out why), but the delete operation can be added to the first command line without doing any harm (if a line contains from where, nothing is printed).

请注意,此代码会错误处理许多 SELECT 语句.

Note that there are many SELECT statements that would be mishandled by this code.

例如:

SELECT *
  FROM Table1 AS T1
  JOIN (SELECT T2.A, T3.B
          FROM Table2 AS T2
          JOIN Table3 AS T3 ON T2.PK = T3.FK
         WHERE T2.ColumnN > T3.ColumnM
       ) AS T4
    ON T1.A = T4.B
 WHERE T1.DateOfBirth > DATE(2000-01-01)

除了大写关键字之外,子查询中的 WHERE 将是 FROM 和 WHERE 之间的匹配停止的地方.

Quite apart from the upper-case keywords, the WHERE in the sub-query would be where the matching between FROM and WHERE stopped.

** 如果您对脱发的原因感到好奇,请查看 为什么 n 而不是 bd 或没有改变 sed 在这个脚本中的行为?.

** In case you're curious about the cause of hair loss, look at Why does an n instead of a b or d or nothing change the behaviour of sed in this script?.

这篇关于如何在 Unix 中提取两种模式之间的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆