使用 BeautifulSoup 提取没有标签的文本 [英] Using BeautifulSoup to extract text without tags

查看:32
本文介绍了使用 BeautifulSoup 提取没有标签的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的网页如下所示:

<p>
  <strong class="offender">YOB:</strong> 1987<br/>
  <strong class="offender">RACE:</strong> WHITE<br/>
  <strong class="offender">GENDER:</strong> FEMALE<br/>
  <strong class="offender">HEIGHT:</strong> 5'05''<br/>
  <strong class="offender">WEIGHT:</strong> 118<br/>
  <strong class="offender">EYE COLOR:</strong> GREEN<br/>
  <strong class="offender">HAIR COLOR:</strong> BROWN<br/>
</p>

我想提取每个人的信息并获得 YOB:1987RACE:WHITE 等...

I want to extract the info for each individual and get YOB:1987, RACE:WHITE, etc...

我尝试的是:

subc = soup.find_all('p')
subc1 = subc[1]
subc2 = subc1.find_all('strong')

但这只给我YOB:RACE:等的值...

But this gives me only the values of YOB:, RACE:, etc...

有没有办法可以获取YOB:1987RACE:WHITE格式的数据?

Is there a way that I can get the data in YOB:1987, RACE:WHITE format?

推荐答案

只需循环遍历所有 标签并使用 next_sibling 以获得您想要的.像这样:

Just loop through all the <strong> tags and use next_sibling to get what you want. Like this:

for strong_tag in soup.find_all('strong'):
    print(strong_tag.text, strong_tag.next_sibling)

演示:

from bs4 import BeautifulSoup

html = '''
<p>
  <strong class="offender">YOB:</strong> 1987<br />
  <strong class="offender">RACE:</strong> WHITE<br />
  <strong class="offender">GENDER:</strong> FEMALE<br />
  <strong class="offender">HEIGHT:</strong> 5'05''<br />
  <strong class="offender">WEIGHT:</strong> 118<br />
  <strong class="offender">EYE COLOR:</strong> GREEN<br />
  <strong class="offender">HAIR COLOR:</strong> BROWN<br />
</p>
'''

soup = BeautifulSoup(html)

for strong_tag in soup.find_all('strong'):
    print(strong_tag.text, strong_tag.next_sibling)

这给你:

YOB:  1987
RACE:  WHITE
GENDER:  FEMALE
HEIGHT:  5'05''
WEIGHT:  118
EYE COLOR:  GREEN
HAIR COLOR:  BROWN

这篇关于使用 BeautifulSoup 提取没有标签的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆