需要提取两个空行之间的文本块 [英] Need to extract a block of text between two blank lines

查看:266
本文介绍了需要提取两个空行之间的文本块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在开发一个bash脚本,能够迅速获取有关域的一些基本的DNS信息的缓慢而稳定的过程。 (想想像LeafDNS或IntoDNS但我可以快速地从CLI运行。)今天,一个同事给了我,我必须完成的脚本,它是如何获取的域名服务器(和它们的IP地址),一个域名所指向的最后一块到,所报告的域名的注册商VIS-A-VIS 挖+跟踪+额外

I have been in the slow and steady process of developing a bash script that can quickly fetch some basic DNS information about a domain. (Think like LeafDNS or IntoDNS but that I can quickly run from CLI.) Today, a coworker gave me the final piece that I needed to complete the script, which is how to fetch the nameservers (and their IPs) that a domain is pointed to, as reported by the domain's registrar vís-a-vís dig +trace +additional.

但问题是,挖+跟踪+额外返回了很多额外的信息,我既不想,也不需要。文本的四个块(由空行分开)回来了,我只需要第三个(前两个是根域名服务器和TLD的父域名服务器,第四块是作为DNS区域报道的域名服务器)。理想情况下,我也想省略挖追加到文本的第三块的结束以及注释,只有拥有域名服务器和它们的IP地址。

The problem, however, is that dig +trace +additional returns a lot of extra information that I neither want nor need. Of the four blocks of text (separated by blank lines) returned, I only need the third one (the first two are the root nameservers, and the TLD's parent nameservers, and the fourth block is the nameservers as reported in the DNS zone). Ideally, I would also like to omit the comment that dig appends to the end of the third block of text as well, to only have the nameservers and their IPs.

我没有找到<一个href=\"http://www.unix.com/shell-programming-scripting/172410-sed-show-lines-text-between-2-blank-lines.html\"相对=nofollow>这个为通过管道的sed挖的输出解决方案,但我只依稀很熟悉的SED。当我直接copypasta的sed命令,我得到的第一和第三块。下面是输出的一个例子:

I did find this as a solution by piping the output of dig through sed, but I'm only vaguely familiar with sed. When I copypasta that sed command directly, I get the first and third blocks. Here's an example of the output:

calyodelphi@dragonpad:~ $ dig +trace +additional dragon-architect.com | sed '/^$/,/^$/!d'

; <<>> DiG 9.7.3-P3 <<>> +trace +additional dragon-architect.com
;; global options: +cmd
.           214851  IN  NS  m.root-servers.net.
.           214851  IN  NS  a.root-servers.net.
.           214851  IN  NS  b.root-servers.net.
.           214851  IN  NS  g.root-servers.net.
.           214851  IN  NS  j.root-servers.net.
.           214851  IN  NS  d.root-servers.net.
.           214851  IN  NS  e.root-servers.net.
.           214851  IN  NS  f.root-servers.net.
.           214851  IN  NS  l.root-servers.net.
.           214851  IN  NS  c.root-servers.net.
.           214851  IN  NS  k.root-servers.net.
.           214851  IN  NS  h.root-servers.net.
.           214851  IN  NS  i.root-servers.net.
;; Received 228 bytes from 192.168.16.1#53(192.168.16.1) in 18 ms


dragon-architect.com.   172800  IN  NS  ns1.dragon-architect.com.
dragon-architect.com.   172800  IN  NS  ns2.dragon-architect.com.
ns1.dragon-architect.com. 172800 IN A   70.84.243.130
ns2.dragon-architect.com. 172800 IN A   70.84.243.131
;; Received 106 bytes from 192.33.14.30#53(b.gtld-servers.net) in 165 ms


calyodelphi@dragonpad:~ $ 

我是pretty失去了很多在这一点就非常AP preciate帮助。无偿奖励积分,如果是简洁,美观,便于携带,易于阅读,并自带的sed命令是如何工作的,所以我可以学到关闭它的解释。我愿意用grep和awk为好;两者将产生最轻便和易于维护的结果。

I'm pretty much lost at this point and would very much appreciate help. Gratuitous bonus points if it's simple, elegant, highly portable, easy to read, and comes with an explanation of how the sed command works so I can learn off of it. I'm open to using grep or awk as well; whichever will yield the most portable and maintainable results.

编辑:
我也知道一些挖参数(尤其是+ nocomments和+ nostats)。不幸的是,他们不与+跟踪工作。所以我有一个awk或者sed手动删除统计/评论。

I do know about several dig arguments (notably +nocomments and +nostats). Unfortunately, they don't work with +trace. So I have to manually remove the stats/comments with sed or awk.

编辑2:
此外,它没有想到我直到今天,需要解决方案来考虑像.co.uk或.com.au顶级域名。我跑了挖一对夫妇像bbc.co.uk域+跟踪+额外和melbourneit.com.au,看是否改变了输出,并且它没有。输出的四块仍会返回,这意味着双方提供的解决方案仍然工作完全按预期。

EDIT 2: Also, it didn't occur to me until today that the solutions needed to consider TLDs like .co.uk or .com.au. I ran a dig +trace +additional on a couple of domains like bbc.co.uk and melbourneit.com.au to see if this changed the output, and it did not. Four blocks of output are still returned, meaning that both provided solutions still work exactly as intended.

推荐答案

您可以用的 AWK 。设置 RS 为空字符串分割在空行寄存器,并设置 FS 来换行分裂与每个注册场字符。这样,我有选择的第三个( FNR == 3 ),删除最后一个字段( $ NF )和尾部的空格和打印:

You can try with awk. Set RS to null string to split registers in blank lines and set FS to newlines to split fields of each register with that character. That way I have to choose the third one (FNR == 3), remove last field ($NF) and trailing spaces, and print:

dig +trace +additional dragon-architect.com | awk '
   BEGIN { RS = ""; FS = OFS = "\n" } 
   FNR == 3 { $NF = ""; sub( /[[:space:]]+$/, "" ); print }
'

它产生的:

dragon-architect.com.   172800  IN  NS  ns1.dragon-architect.com.
dragon-architect.com.   172800  IN  NS  ns2.dragon-architect.com.
ns1.dragon-architect.com. 172800 IN A   70.84.243.130
ns2.dragon-architect.com. 172800 IN A   70.84.243.131

这篇关于需要提取两个空行之间的文本块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆