如何从 HTML 字符串中提取 IP 地址? [英] How to extract an IP address from an HTML string?
本文介绍了如何从 HTML 字符串中提取 IP 地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想使用 Python 从字符串(实际上是一行 HTML)中提取 IP 地址.
<预><代码>>>>s = "<html><head><title>当前 IP 检查</title></head><body>当前 IP 地址:165.91.15.131</body></html>"-- '165.91.15.131' 是我想要的!
我尝试使用正则表达式,但到目前为止我只能得到第一个数字.
<预><代码>>>>进口重新>>>ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s )>>>ip['165']但我对 reg-expression 没有深入的了解;上面的代码是从网上的其他地方找到并修改的.
解决方案
移除您的捕获组:
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )
结果:
['165.91.15.131']
注意事项:
- 如果您正在解析 HTML,最好查看 BeautifulSoup.
- 您的正则表达式匹配了一些无效的 IP 地址,例如
0.00.999.9999
.这不一定是问题,但您应该意识到这一点并可能处理这种情况.您可以将+
更改为{1,3}
以进行部分修复,而不会使正则表达式过于复杂.
I want to extract an IP address from a string (actually a one-line HTML) using Python.
>>> s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>"
-- '165.91.15.131' is what I want!
I tried using regular expressions, but so far I can only get to the first number.
>>> import re
>>> ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s )
>>> ip
['165']
But I don't have a firm grasp on reg-expression; the above code was found and modified from elsewhere on the web.
解决方案
Remove your capturing group:
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )
Result:
['165.91.15.131']
Notes:
- If you are parsing HTML it might be a good idea to look at BeautifulSoup.
- Your regular expression matches some invalid IP addresses such as
0.00.999.9999
. This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. You could change the+
to{1,3}
for a partial fix without making the regular expression overly complex.
这篇关于如何从 HTML 字符串中提取 IP 地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文