Python-BeautifulSoup-无法提取跨度值 [英] Python - BeautifulSoup - Unable to extract Span Value

查看:59
本文介绍了Python-BeautifulSoup-无法提取跨度值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有多个Div类/Span类的XML,并且正在努力提取文本值.

I have an XML with mutiple Div Classes/Span Classes and I'm struggling to extract a text value.

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I want</span>

到目前为止,我已经写了这个:

So far I have written this:

    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    spans = soup.find_all('span', attrs={'class': 'html-tag'})[29]
    print(spans.text)

不幸的是,这仅会打印出这是我不想要的标题"值,例如

This unfortunately only prints out the "This is a Heading that I dont want" value e.g.

This is the heading I dont want

代码中的

数字 [29] 是我需要的文本将始终出现的位置.

Number [29] in the code is the position where the text I need will always appear.

我不确定如何获取所需的跨度值.

I'm unsure how to retrieve the span value I need.

请协助.谢谢

推荐答案

您可以按< div class ="line"> 搜索,然后选择第二个< span> .

You can search by <div class="line"> and then select second <span>.

例如:

txt = '''
   # line 1

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I dont want</span>
   </div>

   # line 2

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I dont want</span>
   </div>

   # line 3

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I want</span>   <--- this is I want
   </div>'''


soup = BeautifulSoup(txt, 'html.parser')
s = soup.select('div.line')[2].select('span')[1]    # select 3rd line 2nd span

print(s.text)

打印:

This is the text I want

这篇关于Python-BeautifulSoup-无法提取跨度值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆