使用BeautifulSoup提取文本没有标签 [英] Using BeautifulSoup Extract Text without Tags
本文介绍了使用BeautifulSoup提取文本没有标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的网页上是这样的 -
My Webpage is something like this -
<p>
<strong class="offender">YOB:</strong> 1987<br />
<strong class="offender">RACE:</strong> WHITE<br />
<strong class="offender">GENDER:</strong> FEMALE<br />
<strong class="offender">HEIGHT:</strong> 5'05''<br />
<strong class="offender">WEIGHT:</strong> 118<br />
<strong class="offender">EYE COLOR:</strong> GREEN<br />
<strong class="offender">HAIR COLOR:</strong> BROWN<br />
</p>
我要提取的信息对每个人,并获得YOB:1987年,RACE:白等....
I want to extract the Info for each individual and get the YOB:1987, RACE:WHITE etc....
我的尝试是 -
subc = soup.findAll('p')
subc1 = subc[1]
subc2 = subc1.findAll('strong')
但是,这给了我只有YOB值:,RACE:等
But this gives me only the values of YOB:, RACE:, etc
有没有一种方法,我可以在YOB得到数据:1987年,RACE:WHITE格式
Is there a way that I can get the data in YOB:1987, RACE:WHITE format?
谢谢,
马尼什
Thanks, Manish
推荐答案
在所有的&LT只是循环;强&GT;
标签和使用<一个href=\"http://www.crummy.com/software/BeautifulSoup/bs4/doc/#next-sibling-and-$p$pvious-sibling\"><$c$c>next_sibling$c$c>得到你想要的东西。像这样的:
Just loop through all the <strong>
tags and use next_sibling
to get what you want. Like this:
for strong_tag in soup.find_all('strong'):
print strong_tag.text, strong_tag.next_sibling
演示:
>>> from bs4 import BeautifulSoup
>>> html = '''
... <p>
... <strong class="offender">YOB:</strong> 1987<br />
... <strong class="offender">RACE:</strong> WHITE<br />
... <strong class="offender">GENDER:</strong> FEMALE<br />
... <strong class="offender">HEIGHT:</strong> 5'05''<br />
... <strong class="offender">WEIGHT:</strong> 118<br />
... <strong class="offender">EYE COLOR:</strong> GREEN<br />
... <strong class="offender">HAIR COLOR:</strong> BROWN<br />
... </p>
... '''
>>> soup = BeautifulSoup(html)
>>> for strong_tag in soup.find_all('strong'):
... print strong_tag.text, strong_tag.next_sibling
这给你:
YOB: 1987
RACE: WHITE
GENDER: FEMALE
HEIGHT: 5'05''
WEIGHT: 118
EYE COLOR: GREEN
HAIR COLOR: BROWN
这篇关于使用BeautifulSoup提取文本没有标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文