正则表达式：删除可以包含其他句点的字符串的最后一个句点（挖掘输出） [英] RegExp: Remove last period in string that can contain other periods (dig output)

查看：188 发布时间：2017/11/9 21:10:24 python regex find

本文介绍了正则表达式：删除可以包含其他句点的字符串的最后一个句点（挖掘输出）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图解析linux的 dig 命令的输出，并做几个我们说，我挖掘主机 mail.yahoo.com ：

  / usr / bin / dig + nocomments + noquestion \ 
 + noauthority + noadditional + nostats + nocmd \ 
 mail.yahoo.com A

这个命令输出：

  mail.yahoo.com。 0在CNAME login.yahoo.com。 
 login.yahoo.com。 0在CNAME ats.login.lgg1.b.yahoo.com。 
 ats.login.lgg1.b.yahoo.com。 0在CNAME ats.member.g02.yahoodns.net。 
 ats.member.g02.yahoodns.net。 0在CNAME any-ats.member.a02.yahoodns.net中。 
 any-ats.member.a02.yahoodns.net。 12 IN A 98.139.21.169

我想要找到所有< host> ，< record_type> 和< resolved_name> 最后一个阶段只使用一个正则表达式

对于 mail.yahoo.com be：

  [
（'mail.yahoo.com'，'CNAME'，'login.yahoo.com '），
（'login.yahoo.com'，'CNAME'，'ats.login.lgg1.b.yahoo.com'），
（'ats.login.lgg1.b.yahoo .com'，'CNAME'，'ats.member.g02.yahoodns.net'），
（'ats.member.g02.yahoodns.net'，'CNAME'，'any-ats.member.a02 .yahoodns.net'），
（'any-ats.member.a02.yahoodns.net'，'A'，'98 .139.21.169'），
]

但事实证明， dig 命令最后可能会显示一段时间的名字：

  mail.yahoo.com。 
 ^ ^ ^ 
 | | | 
好点| | 
 | | 
好点| 
 | 
（！）Baaaad dot

执行正则表达式拆分挖掘的输出，并返回名称与最后期限是相当直接：
$ b

  regex = re .compile（^（\S +）。+ IN \s +（[AZ] +）\ s +（\ S +）\。* \s * $，re.MULTILINE）

但是使用该正则表达式调用 .findall 主持人，因为 \ S + 也会与上一期相符：

  [
（'mail.yahoo.com。'，'CNAME'，'login.yahoo.com'），
（'login.yahoo.com'，'CNAME'，' ats.login.lgg1.b.yahoo.com。'），
（'ats.login.lgg1.b.yahoo.com。'，'CNAME'，'ats.member.g02.yahoodns.net。 '），
（'ats.member.g02.yahoodns.net。'，'CNAME'，'any-ats.member.a02.yahoodns.net。'），
（'any-ats .member.a02.yahoodns.net。'，'A'，'98 .139.21.169'），
]

所以我需要匹配所有非空格 \ S ，除非是一段时间后跟一个空格。

我做了无数的尝试，而且我还没有能够提出一个体面的解决方案。

提前感谢您！
$ b
PS：
I知道我总是可以使用简单的正则表达式，并在第二遍中删除找到的字符串的最后一个点，但我很好奇这是否可以在一个正则表达式中完成。
解决方案
您可以在多行修饰符中使用此模式：

<$ （+）[+] +（[+]）+（[+]）+（+ 。+（？<！\。））\。$

存储在$ 1 $ 2 and $ 3

DEMO

编辑：试试这个：

$ $ $ $ $ $ $ $ ^（^ ^ \t] +）（？ <！\。）\。[[\t] + [0-9] + [\t] + IN [\t] +（[^ \t] +）[\t] +（。+（？<！\。））\。$

I am trying to parse the output of the linux dig command and do several things on one shot with regular expressions.

Let's say I dig the host mail.yahoo.com:
/usr/bin/dig +nocomments +noquestion \ +noauthority +noadditional +nostats +nocmd \ mail.yahoo.com A
This command outputs:
mail.yahoo.com. 0 IN CNAME login.yahoo.com. login.yahoo.com. 0 IN CNAME ats.login.lgg1.b.yahoo.com. ats.login.lgg1.b.yahoo.com. 0 IN CNAME ats.member.g02.yahoodns.net. ats.member.g02.yahoodns.net. 0 IN CNAME any-ats.member.a02.yahoodns.net. any-ats.member.a02.yahoodns.net. 12 IN A 98.139.21.169
What I'd like to is finding all the <host>, <record_type> and <resolved_name> parts without the final period using only one regular expression

For this particular example with mail.yahoo.com, it'd be:
[ ('mail.yahoo.com', 'CNAME', 'login.yahoo.com'), ('login.yahoo.com', 'CNAME', 'ats.login.lgg1.b.yahoo.com'), ('ats.login.lgg1.b.yahoo.com', 'CNAME', 'ats.member.g02.yahoodns.net'), ('ats.member.g02.yahoodns.net', 'CNAME', 'any-ats.member.a02.yahoodns.net'), ('any-ats.member.a02.yahoodns.net', 'A', '98.139.21.169'), ]
But it turns out that the dig command might be showing a period at the end of the name:
mail.yahoo.com. ^ ^ ^ | | | Good dot | | | | Good dot | | (!) Baaaad dot
Doing a regular expression that splits dig's output and returns the name with the final period is fairly straightforward:
regex = re.compile("^(\S+).+IN\s+([A-Z]+)\s+(\S+)\.*\s*$",re.MULTILINE)
But calling .findall with that regex does return the final period in the host, because \S+ will match the last period as well:
[ ('mail.yahoo.com.', 'CNAME', 'login.yahoo.com.'), ('login.yahoo.com.', 'CNAME', 'ats.login.lgg1.b.yahoo.com.'), ('ats.login.lgg1.b.yahoo.com.', 'CNAME', 'ats.member.g02.yahoodns.net.'), ('ats.member.g02.yahoodns.net.', 'CNAME', 'any-ats.member.a02.yahoodns.net.'), ('any-ats.member.a02.yahoodns.net.', 'A', '98.139.21.169'), ]
So I'd need something that matches all non-spaces \S except if it's a period followed by a whitespace.

I've done endless tries, and I haven't been able to come up with a decent solution.

Thank you in advance!

PS: I know I can always use the "easy" regular expression and (on a second pass) remove the last dot of the found string, but I'm curious about whether this can be done with a regular expression in one shot.
解决方案
You can use this pattern with multiline modifier:
^([^ ]+)(?<!\.)\.?[ ]+[0-9]+[ ]+IN[ ]+([^ ]+)[ ]+(.+(?<!\.))\.?$
Groups stored in $1 $2 and $3

DEMO

Edit: Try this:
^([^ \t]+)(?<!\.)\.?[ \t]+[0-9]+[ \t]+IN[ \t]+([^ \t]+)[ \t]+(.+(?<!\.))\.?$

这篇关于正则表达式：删除可以包含其他句点的字符串的最后一个句点（挖掘输出）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式：删除可以包含其他句点的字符串的最后一个句点（挖掘输出） [英] RegExp: Remove last period in string that can contain other periods (dig output)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

正则表达式：删除可以包含其他句点的字符串的最后一个句点（挖掘输出） [英] RegExp: Remove last period in string that can contain other periods (dig output)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭