使用 BeautifulSoup 提取 HTML 注释之间的文本 [英] Extracting Text Between HTML Comments with BeautifulSoup

查看:33
本文介绍了使用 BeautifulSoup 提取 HTML 注释之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Python 3 和 BeautifulSoup 4,我希望能够从 HTML 页面中提取文本,该页面仅由其上方的注释描述.一个例子:

Using Python 3 and BeautifulSoup 4, I would like to be able to extract text from an HTML page that only delineated by a comment above it. An example:

<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text

我找到了各种方法来提取页面的文本或评论,但没有办法做我想要的.任何帮助将不胜感激.

I have found various ways to extract a page's text or comments, but no way to do what I'm looking for. Any help would be greatly appreciated.

推荐答案

你只需要遍历所有可用的评论,看看它是否是你需要的条目之一,然后显示以下元素的文本如下:

You just need to iterate through all of the available comments to see if it is one of your required entries, and then display the text for the following element as follows:

from bs4 import BeautifulSoup, Comment

html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')

for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):
    if comment in ['UNIQUE COMMENT', 'SECOND UNIQUE COMMENT']:
        print comment.next_element.strip()

这将显示以下内容:

I would like to get this text
I would also like to find this text

这篇关于使用 BeautifulSoup 提取 HTML 注释之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆