使用Beautiful Soup查找下一个出现的标签及其包含的文本 [英] Finding next occurring tag and its enclosed text with Beautiful Soup
本文介绍了使用Beautiful Soup查找下一个出现的标签及其包含的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试解析标记<blockquote>
之间的文本.当我键入soup.blockquote.get_text()
时.
I'm trying to parse text between the tag <blockquote>
. When I type soup.blockquote.get_text()
.
对于HTML文件中第一个出现的blockquote,我得到了想要的结果.如何在文件中找到下一个和顺序的<blockquote>
标记?也许我只是累了,无法在文档中找到它.
I get the result I want for the first occurring blockquote in the HTML file. How do I find the next and sequential <blockquote>
tag in the file? Maybe I'm just tired and can't find it in the documentation.
示例HTML文件:
<html>
<head>header
</head>
<blockquote>I can get this text
</blockquote>
<p>eiaoiefj</p>
<blockquote>trying to capture this next
</blockquote>
<p></p><strong>do not capture this</strong>
<blockquote>
capture this too but separately after "capture this next"
</blockquote>
</html>
简单的python代码:
the simple python code:
from bs4 import BeautifulSoup
html_doc = open("example.html")
soup = BeautifulSoup(html_doc)
print.(soup.blockquote.get_text())
# how to get the next blockquote???
推荐答案
使用 find_next
代替)
Use find_next_sibling
(If it not a sibling, use find_next
instead)
>>> html = '''
... <html>
... <head>header
... </head>
... <blockquote>blah blah
... </blockquote>
... <p>eiaoiefj</p>
... <blockquote>capture this next
... </blockquote>
... <p></p><strong>don'tcapturethis</strong>
... <blockquote>
... capture this too but separately after "capture this next"
... </blockquote>
... </html>
... '''
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html)
>>> quote1 = soup.blockquote
>>> quote1.text
u'blah blah\n'
>>> quote2 = quote1.find_next_siblings('blockquote')
>>> quote2.text
u'capture this next\n'
这篇关于使用Beautiful Soup查找下一个出现的标签及其包含的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文