匹配引号中的文本(新手) [英] matching text in quotes (newbie)
问题描述
我完全迷失在 shell 编程中,主要是因为我使用的每个站点都提供不同的工具来进行模式匹配.所以我的问题是使用什么工具在管道流中进行简单的模式匹配.
I'm getting totally lost in shell programming, mainly because every site I use offers different tool to do pattern matching. So my question is what tool to use to do simple pattern matching in piped stream.
context:我有named.conf 文件,我需要一个简单文件中的所有区域名称以供进一步处理.所以我做 ~$ cat named.local |grep zone 并在这里完全迷路.我的输出是 ~100 个左右的换行符,格式为 'zone "domain.tld" {',我需要用双引号括起来的文本.
context: I have named.conf file, and i need all zones names in a simple file for further processing. So I do ~$ cat named.local | grep zone and get totally lost here. My output is ~hundred or so newlines in form 'zone "domain.tld" {' and I need text in double quotes.
感谢您展示一种方法.
J
推荐答案
我认为您正在寻找的是 sed
...这是一个 stream >editor 可让您逐行进行替换.
I think what you're looking for is sed
... it's a stream editor which will let you do replacements on a line-by-line basis.
正如你所解释的,命令`cat named.local |grep zone' 给你的输出有点像这样:
As you're explaining it, the command `cat named.local | grep zone' gives you an output a little like this:
zone "domain1.tld" {
zone "domain2.tld" {
zone "domain3.tld" {
zone "domain4.tld" {
我猜你希望输出是这样的,因为你说你需要双引号中的文本:
I'm guessing you want the output to be something like this, since you said you need the text in double quotes:
"domain1.tld"
"domain2.tld"
"domain3.tld"
"domain4.tld"
因此,实际上,从每一行开始,我们只想要双引号之间的文本(包括双引号本身).
So, in reality, from each line we just want the text between the double-quotes (including the double-quotes themselves.)
我不确定您是否熟悉正则表达式,但它们是非常宝贵的工具对于任何编写 shell 脚本的人.例如,正则表达式 /.oe/
将匹配任何行,其中第 2 个字母是小写的 o
,第 4 个字母是 e
.这将匹配包含zone
"、tone
",甚至I amtone-deaf.
"
I'm not sure you're familiar with Regular Expressions, but they are an invaluable tool for any person writing shell scripts. For example, the regular expression /.o.e/
would match any line where there's a word with the 2nd letter was a lower-case o
, and the 4th was e
. This would match string containing words like "zone
", "tone
", or even "I am tone-deaf.
"
诀窍是使用 .
(点)字符来表示任何字母".还有一些其他特殊字符,例如 *
表示重复前一个字符 0 次或更多次".因此,像 a*
这样的正则表达式将匹配 "a
"、"aaaaaaa
" 或空字符串:""
The trick there was to use the .
(dot) character to mean "any letter". There's a couple of other special characters, such as *
which means "repeat the previous character 0 or more times". Thus a regular expression like a*
would match "a
", "aaaaaaa
", or an empty string: ""
因此您可以使用以下方法匹配引号内的字符串:/".*"/
So you can match the string inside the quotes using: /".*"/
关于sed
,您还应该知道另一件事(通过评论,您已经知道了!) - 它允许回溯.一旦你告诉它如何识别一个词,你就可以让它使用这个词作为替换的一部分.例如,假设您想翻转此列表:
There's another thing you would know about sed
(and by the comments, you already do!) - it allows backtracking. Once you've told it how to recognize a word, you can have it use that word as part of the replacement. For example, let's say that you wanted to turn this list:
Billy "The Kid" Smith
Jimmy "The Fish" Stuart
Chuck "The Man" Norris
进入这个列表:
The Kid
The Fish
The Man
首先,您要查找引号内的字符串.我们已经看到了,它是 /".*"/
.
First, you'd look for the string inside the quotes. We already saw that, it was /".*"/
.
接下来,我们要使用引号内的内容.我们可以使用括号对它进行分组:/"(.*)"/
Next, we want to use what's inside the quotes. We can group it using parens: /"(.*)"/
如果我们想用带下划线的引号替换文本,我们会做一个替换:s/"(.*)"/_/
,这样我们就会得到:
If we wanted to replace the text with the quotes with an underscore, we'd do a replace: s/"(.*)"/_/
, and that would leave us with:
Billy _ Smith
Jimmy _ Stuart
Chuck _ Norris
但我们有回溯!这将让我们使用符号 \1
回忆括号内的内容.所以如果我们现在这样做: s/"(.*)"/\1/
我们会得到:
But we have backtracking! That'll let us recall what was inside the parens, using the symbol \1
. So if we do now: s/"(.*)"/\1/
we'll get:
Billy The Kid Smith
Jimmy The Fish Stuart
Chuck The Man Norris
因为引号不在括号中,所以它们不是 \1
内容的一部分!
Because the quotes weren't in the parens, they weren't part of the contents of \1
!
为了只保留双引号内的内容,我们需要匹配整行.要做到这一点,我们有 ^
(意思是行首")和 $
(意思是行尾".)
To only leave the stuff inside the double-quotes, we need to match the entire line. To do that we have ^
(which means "beginning of line"), and $
(which means "end of line".)
所以现在如果我们使用 s/^.*"(.*)".*$/\1/
,我们会得到:
So now if we use s/^.*"(.*)".*$/\1/
, we'll get:
The Kid
The Fish
The Man
为什么?让我们从左到右阅读正则表达式 s/^.*"(.*)".*$/\1/
:
Why? Let's read the regular expression s/^.*"(.*)".*$/\1/
from left-to-right:
s/
- 开始一个替换正则表达式^
- 查找行的开头.从那里开始..*
- 继续阅读每个字符,直到..."
- ... 直到出现双引号.(
- 开始一组我们可能想在回溯时回忆的字符..*
- 继续阅读每个字符,直到...)
- (pssst!关闭群!)"
- ... 直到出现双引号..*
- 继续阅读每个字符,直到...$
- 行尾!
s/
- Start a substitution regular expression^
- Look for the beginning of the line. Start from there..*
- Keep going, reading every character, until..."
- ... until you reach a double-quote.(
- start a group a characters we might want to recall later when backtracking..*
- Keep going, reading every character, until...)
- (pssst! close the group!)"
- ... until you reach a double-quote..*
- Keep going, reading every character, until...$
- The end of the line!
/
- 使用后面的内容替换匹配的内容
/
- use what's after this to replace what you matched
简单的英语:阅读整行,将双引号之间的文本复制到一边.然后用双引号之间的内容替换整行."
In plain English: "Read the entire line, copying aside the text between the double-quotes. Then replace the entire line with the content between the double qoutes."
您甚至可以在替换文本 s/^.*"(.*)".*$/"\1"/
周围添加双引号,这样我们将得到:>
You can even add double-quote around the replacing text s/^.*"(.*)".*$/"\1"/
, so we'll get:
"The Kid"
"The Fish"
"The Man"
sed
可以使用它来用引号内的内容替换该行:
And that can be used by sed
to replace the line with the content from within the quotes:
sed -e "s/^.*\"\(.*\)\".*$/\"\1\"/"
(这只是 shell 转义以处理双引号和斜杠之类的东西.)
(This is just shell-escaped to deal with the double-quotes and slashes and stuff.)
所以整个命令应该是这样的:
So the whole command would be something like:
cat named.local | grep zone | sed -e "s/^.*\"\(.*\)\".*$/\"\1\"/"
这篇关于匹配引号中的文本(新手)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!