Python 重复捕获组 [英] Python Repeated Capture Groups

查看:48
本文介绍了Python 重复捕获组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析一系列 SHOW CDP NEIGHBORS DETAIL 输出,以便我可以捕获每个设备及其 IP 地址.

I'm attempting to parse a series of SHOW CDP NEIGHBORS DETAIL outputs so I can capture each device and its ip address.

我遇到的问题是某些设备可能分配了多个 IP 地址,这是一个示例输出.

The issue that I am coming across is that some devices may have multiple ip addresses assigned to it, here is an example output.

Device ID: RTPER1.MFN21Mb.domain.local
Entry address(es): 
  IP address: 200.152.51.3
  IP address: 82.159.177.233
  IP address: 201.152.51.140
  IP address: 84.252.100.3
Platform: Cisco 2821,  Capabilities: Router Switch IGMP 

我写了这个正则表达式来捕获输入,根据 gskinner 它匹配所有 4 个 ip 地址,但捕获只是最后一个(正如预期的正则表达式)

I wrote this regex to capture the input, and according to gskinner it matches all 4 ip addresses, but the capture is just the last one (as expected from regex)

Device ID: ([0-9A-Za-z-.&]+)\s+Entry address\(es\):\s+(?:IP address: ([0-9.]+)\s+)+

所以我上网想知道如何做到这一点.我尝试了此处建议的正则表达式 捕获 Python 正则表达式中的重复子模式 但使用正则表达式模块没有改变输出.我仍然只得到列表中的最后一个 ip 地址,其他的都没有.

So I went online to figure out how to do this. I tried teh regex suggested here Capturing repeating subpatterns in Python regex but using the regex module did not change the output. I still only get the last ip address on the list, and none of the others.

按照我得到的例子

temp = regex.match(r'Device ID: ([0-9A-Za-z-.&]+)\s+Entry address\(es\):\s+(?:IP address: ([0-9.]+)\s+)+', file)
print temp

Temp 返回 None.

Temp returns None.

如果我找到所有.我得到的只是最后一个 ip 地址 84.252.100.3

If I do findall. I get a return of just the last ip address 84.252.100.3

如果我添加多个捕获组,比如

If I add multiple capture groups, such as

temp = regex.findall(r'Device ID: ([0-9A-Za-z-.&]+)\s+Entry address\(es\):\s+(?:IP address: ([0-9.]+)\s+)?\s+(?:IP address: ([0-9.]+)\s+)?\s+(?:IP address: ([0-9.]+)\s+)?\s+(?:IP address: ([0-9.]+)\s+)?\s+(?:IP address: ([0-9.]+)\s+)?', file)
print temp

只匹配有多个ip地址的,不匹配其他的

It only matches the ones that have mutliple ip addresses, and not the others

希望有人能帮忙.

推荐答案

据我所知,只有 .NET 允许您迭代量化(重复)捕获组.考虑这个(有限的)替代方案:

As far as I'm aware, only .NET allows you to iterate through quantified (repeated) capturing groups. Consider this (finite) alternative:

Device ID: ([0-9A-Za-z-.&]+)\s+Entry address\(es\):\s+(?:IP address: ([0-9.]+)\s+)(?:IP address: ([0-9.]+)\s+)?(?:IP address: ([0-9.]+)\s+)?(?:IP address: ([0-9.]+)\s+)?
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

这将在 $2 中捕获 1 个 IP 地址,在 $3$4$5 中捕获最多三个 IP 地址代码>.(当然,我习惯地使用 $ 表示法.)您可以添加任意数量的.如果您需要所有 IP 地址都出现在一个组中,i.e. $2,那么您唯一的选择是将文本包含在它们中:

This will capture up 1 IP address in $2 and up to three more in $3, $4, and $5. (I'm using the $ notation idiomatically, of course.) You can add as many as you want. If you need all of the IP addresses to be present in a single group, i.e. $2, then your only choice is to include the text with them:

Device ID: ([0-9A-Za-z-.&]+)\s+Entry address\(es\):\s+((?:IP address: (?:[0-9.]+)\s+)+)
                                                      ^                ^^             ^

这篇关于Python 重复捕获组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆