正则表达式使用 Python 从 HTML 的 href 属性中提取 URL [英] Regex to extract URLs from href attribute in HTML with Python

查看：52 发布时间：2021/6/25 19:16:49 python regex url

本文介绍了正则表达式使用 Python 从 HTML 的 href 属性中提取 URL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

<块引用>

考虑一个字符串如下:

string = "<p>Hello World</p><a href="http://example.com">更多示例</a><a href="http://example2.com">更多示例</a>"

我如何使用 Python 提取锚标记的 href 内的网址?类似的东西:

<预><代码>>>>url = getURLs(string)>>>网址['http://example.com', 'http://example2.com']

谢谢！

解决方案

import reurl = '<p>Hello World</p><a href="http://example.com">更多示例</a><a href="http://example2.com">;更多例子'urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', url)>>>打印网址['http://example.com', 'http://example2.com']

Possible Duplicate:
What is the best regular expression to check if a string is a valid URL?

Considering a string as follows:

string = "<p>Hello World</p><a href="http://example.com">More Examples</a><a href="http://example2.com">Even More Examples</a>"

How could I, with Python, extract the urls, inside the anchor tag's href? Something like:

>>> url = getURLs(string)
>>> url
['http://example.com', 'http://example2.com']

Thanks!

解决方案

import re

url = '<p>Hello World</p><a href="http://example.com">More Examples</a><a href="http://example2.com">Even More Examples</a>'

urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', url)

>>> print urls
['http://example.com', 'http://example2.com']

这篇关于正则表达式使用 Python 从 HTML 的 href 属性中提取 URL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式使用 Python 从 HTML 的 href 属性中提取 URL [英] Regex to extract URLs from href attribute in HTML with Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

正则表达式使用 Python 从 HTML 的 href 属性中提取 URL [英] Regex to extract URLs from href attribute in HTML with Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭