如何从 HTML 字符串中提取 IP 地址? [英] How to extract an IP address from an HTML string?

查看:57
本文介绍了如何从 HTML 字符串中提取 IP 地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Python 从字符串(实际上是一行 HTML)中提取 IP 地址.

<预><代码>>>>s = "<html><head><title>当前 IP 检查</title></head><body>当前 IP 地址:165.91.15.131</body></html>"

-- '165.91.15.131' 是我想要的!

我尝试使用正则表达式,但到目前为止我只能得到第一个数字.

<预><代码>>>>进口重新>>>ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s )>>>ip['165']

但我对 reg-expression 没有深入的了解;上面的代码是从网上的其他地方找到并修改的.

解决方案

移除您的捕获组:

ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )

结果:

['165.91.15.131']

注意事项:

  • 如果您正在解析 HTML,最好查看 BeautifulSoup.
  • 您的正则表达式匹配了一些无效的 IP 地址,例如 0.00.999.9999.这不一定是问题,但您应该意识到这一点并可能处理这种情况.您可以将 + 更改为 {1,3} 以进行部分修复,而不会使正则表达式过于复杂.

I want to extract an IP address from a string (actually a one-line HTML) using Python.

>>> s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>"

-- '165.91.15.131' is what I want!

I tried using regular expressions, but so far I can only get to the first number.

>>> import re
>>> ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s )
>>> ip
['165']

But I don't have a firm grasp on reg-expression; the above code was found and modified from elsewhere on the web.

解决方案

Remove your capturing group:

ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )

Result:

['165.91.15.131']

Notes:

  • If you are parsing HTML it might be a good idea to look at BeautifulSoup.
  • Your regular expression matches some invalid IP addresses such as 0.00.999.9999. This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. You could change the + to {1,3} for a partial fix without making the regular expression overly complex.

这篇关于如何从 HTML 字符串中提取 IP 地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆