使用BeautifulSoup提取文本没有标签 [英] Using BeautifulSoup Extract Text without Tags

查看:176
本文介绍了使用BeautifulSoup提取文本没有标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的网页上是这样的 -

My Webpage is something like this -

<p>
    <strong class="offender">YOB:</strong> 1987<br />
    <strong class="offender">RACE:</strong> WHITE<br />
    <strong class="offender">GENDER:</strong> FEMALE<br />
    <strong class="offender">HEIGHT:</strong> 5'05''<br />
    <strong class="offender">WEIGHT:</strong> 118<br />
    <strong class="offender">EYE COLOR:</strong> GREEN<br />
    <strong class="offender">HAIR COLOR:</strong> BROWN<br />
</p>

我要提取的信息对每个人,并获得YOB:1987年,RACE:白等....

I want to extract the Info for each individual and get the YOB:1987, RACE:WHITE etc....

我的尝试是 -

subc = soup.findAll('p')
subc1 = subc[1]
subc2 = subc1.findAll('strong')

但是,这给了我只有YOB值:,RACE:等

But this gives me only the values of YOB:, RACE:, etc

有没有一种方法,我可以在YOB得到数据:1987年,RACE:WHITE格式

Is there a way that I can get the data in YOB:1987, RACE:WHITE format?

谢谢,
马尼什

Thanks, Manish

推荐答案

在所有的&LT只是循环;强&GT; 标签和使用<一个href=\"http://www.crummy.com/software/BeautifulSoup/bs4/doc/#next-sibling-and-$p$pvious-sibling\"><$c$c>next_sibling得到你想要的东西。像这样的:

Just loop through all the <strong> tags and use next_sibling to get what you want. Like this:

for strong_tag in soup.find_all('strong'):
    print strong_tag.text, strong_tag.next_sibling

演示:

>>> from bs4 import BeautifulSoup
>>> html = '''
... <p>
...     <strong class="offender">YOB:</strong> 1987<br />
...     <strong class="offender">RACE:</strong> WHITE<br />
...     <strong class="offender">GENDER:</strong> FEMALE<br />
...     <strong class="offender">HEIGHT:</strong> 5'05''<br />
...     <strong class="offender">WEIGHT:</strong> 118<br />
...     <strong class="offender">EYE COLOR:</strong> GREEN<br />
...     <strong class="offender">HAIR COLOR:</strong> BROWN<br />
... </p>
... '''
>>> soup = BeautifulSoup(html)
>>> for strong_tag in soup.find_all('strong'):
...     print strong_tag.text, strong_tag.next_sibling

这给你:

YOB:  1987
RACE:  WHITE
GENDER:  FEMALE
HEIGHT:  5'05''
WEIGHT:  118
EYE COLOR:  GREEN
HAIR COLOR:  BROWN

这篇关于使用BeautifulSoup提取文本没有标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆