需要使用 RegEx 和 BeautifulSoup 查找文本 [英] Need to find text with RegEx and BeautifulSoup

查看：31 发布时间：2021/12/23 20:52:09 python regex python-2.7 web-scraping beautifulsoup

本文介绍了需要使用 RegEx 和 BeautifulSoup 查找文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试解析网站以提取存储在正文中的一些数据，例如:

I'm trying to parse a website to pull out some data that is stored in the body such as this:

<body>
    <b>INFORMATION</b>
    Hookups: None
    Group Sites: No
    Station: No

    <b>Details</b>
    Ramp: Yes
</body>

我想使用 BeautifulSoup4 和 RegEx 来提取 Hookups 和 Group Sites 等的值，但我对 bs4 和 RegEx 都不熟悉.我尝试了以下方法来获取连接值:

I would like to use BeautifulSoup4 and RegEx to pull out the values for Hookups and Group Sites and so on, but I am new to both bs4 and RegEx. I have tried the following to get the Hookups Value:

soup = BeautifulSoup(open('doc.html'))
hookups = soup.find_all(re.compile("Hookups:(.*)Group"))

但搜索返回空.

推荐答案

BeautifulSoup 的 find_all 仅适用于标签.假设 HTML 如此简单，您实际上可以仅使用纯正则表达式来获得所需的内容.否则，您可以使用 find_all 然后获取 .text 节点.

BeautifulSoup's find_all only works with tags. You can actually use just a pure regex to get what you need assuming the HTML is this simple. Otherwise you can use find_all and then get the .text nodes.

re.findall("Hookups: (.*)", open('doc.html').read())

从 BeautifulSoup 4.2 开始，您还可以使用 text 属性按标签内容进行搜索

You can also search by tag content with the text property as of BeautifulSoup 4.2

soup.find_all(text=re.compile("Hookups:(.*)Group"));

从 BeautifulSoup 4.4 开始，text 参数被命名为 string.

Since BeautifulSoup 4.4, the text argument is named string.

这篇关于需要使用 RegEx 和 BeautifulSoup 查找文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

需要使用 RegEx 和 BeautifulSoup 查找文本 [英] Need to find text with RegEx and BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

需要使用 RegEx 和 BeautifulSoup 查找文本 [英] Need to find text with RegEx and BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭