Python-BeautifulSoup-无法提取跨度值 [英] Python - BeautifulSoup - Unable to extract Span Value
本文介绍了Python-BeautifulSoup-无法提取跨度值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个具有多个Div类/Span类的XML,并且正在努力提取文本值.
I have an XML with mutiple Div Classes/Span Classes and I'm struggling to extract a text value.
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I want</span>
到目前为止,我已经写了这个:
So far I have written this:
html = driver.page_source
soup = BeautifulSoup(html, "lxml")
spans = soup.find_all('span', attrs={'class': 'html-tag'})[29]
print(spans.text)
不幸的是,这仅会打印出这是我不想要的标题"值,例如
This unfortunately only prints out the "This is a Heading that I dont want" value e.g.
This is the heading I dont want
代码中的
数字 [29]
是我需要的文本将始终出现的位置.
Number [29]
in the code is the position where the text I need will always appear.
我不确定如何获取所需的跨度值.
I'm unsure how to retrieve the span value I need.
请协助.谢谢
推荐答案
您可以按< div class ="line">
搜索,然后选择第二个< span>
.
You can search by <div class="line">
and then select second <span>
.
例如:
txt = '''
# line 1
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I dont want</span>
</div>
# line 2
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I dont want</span>
</div>
# line 3
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I want</span> <--- this is I want
</div>'''
soup = BeautifulSoup(txt, 'html.parser')
s = soup.select('div.line')[2].select('span')[1] # select 3rd line 2nd span
print(s.text)
打印:
This is the text I want
这篇关于Python-BeautifulSoup-无法提取跨度值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文