使用 Beautiful Soup 查找下一个出现的标签及其包含的文本 [英] Finding next occurring tag and its enclosed text with Beautiful Soup
本文介绍了使用 Beautiful Soup 查找下一个出现的标签及其包含的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试解析标签
之间的文本.当我输入soup.blockquote.get_text()
时.对于 HTML 文件中第一个出现的块引用,我得到了我想要的结果.如何在文件中找到下一个连续的
标签?也许我只是累了,在文档中找不到它.示例 HTML 文件:
<头>头头部><blockquote>我可以得到这个文本</blockquote><p>eiaoiefj</p><blockquote>尝试捕捉下一个</blockquote><p></p><strong>不要捕捉这个</strong><blockquote>也捕获这个,但在捕获下一个"之后分开</blockquote>
简单的python代码:
from bs4 import BeautifulSouphtml_doc = open("example.html")汤 = BeautifulSoup(html_doc)打印.(soup.blockquote.get_text())# 如何获得下一个区块引用???
解决方案使用
<预><代码>>>>html = '''... <html>... <head>标题... </head>... <blockquote>blah blah... </blockquote>... <p>eiaoiefj</p>... <blockquote>捕获下一个... </blockquote>... <p></p><strong>不要捕捉这个</strong>... <blockquote>... 也捕获这个,但在捕获下一个"之后分开... </blockquote>... </html>...'''>>>从 bs4 导入 BeautifulSoup>>>汤 = BeautifulSoup(html)>>>quote1 = 汤.blockquote>>>报价1.text你呸呸 '>>>quote2 = quote1.find_next_siblings('blockquote')>>>报价2.text你接下来捕捉这个 'find_next_sibling
(如果不是兄弟,使用find_next
代替)I'm trying to parse text between the tag
<blockquote>
. When I typesoup.blockquote.get_text()
.I get the result I want for the first occurring blockquote in the HTML file. How do I find the next and sequential
<blockquote>
tag in the file? Maybe I'm just tired and can't find it in the documentation.Example HTML file:
<html> <head>header </head> <blockquote>I can get this text </blockquote> <p>eiaoiefj</p> <blockquote>trying to capture this next </blockquote> <p></p><strong>do not capture this</strong> <blockquote> capture this too but separately after "capture this next" </blockquote> </html>
the simple python code:
from bs4 import BeautifulSoup html_doc = open("example.html") soup = BeautifulSoup(html_doc) print.(soup.blockquote.get_text()) # how to get the next blockquote???
解决方案Use
find_next_sibling
(If it not a sibling, usefind_next
instead)>>> html = ''' ... <html> ... <head>header ... </head> ... <blockquote>blah blah ... </blockquote> ... <p>eiaoiefj</p> ... <blockquote>capture this next ... </blockquote> ... <p></p><strong>don'tcapturethis</strong> ... <blockquote> ... capture this too but separately after "capture this next" ... </blockquote> ... </html> ... ''' >>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup(html) >>> quote1 = soup.blockquote >>> quote1.text u'blah blah ' >>> quote2 = quote1.find_next_siblings('blockquote') >>> quote2.text u'capture this next '
这篇关于使用 Beautiful Soup 查找下一个出现的标签及其包含的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文