BeautifulSoup:只是得到一个标记中,无论有多少封闭标签有 [英] BeautifulSoup: just get inside of a tag, no matter how many enclosing tags there are
问题描述
我试图刮掉从&LT所有的内部HTML; P>使用BeautifulSoup在网页
元素。有内部的标签,但我不在乎,我只想让内部文本。
例如,对于
< P>红色和LT; / P>
< P>< I>蓝色< / I>< / P>
&所述p为H.;黄色和下; / P>
< P>灯光下,B>绿色< / B>< / P>
我如何可以提取:
红
蓝色
黄色
葱绿
无论是 .string
也不 .contents [0]
做什么,我需要的。也不 .extract()
,因为我不希望有预先指定的内部变量 - 我想,以应付任何可能发生的
有没有刚拿到可见HTML'类型的方法在BeautifulSoup?
---- ------更新
在咨询,想:
汤= BeautifulSoup(开放(test.html的))
p_tags = soup.findAll('P',文本= TRUE)
对于我,P_TAG在历数(p_tags):
打印STR(I)+ P_TAG
但是,这并不能帮助 - 它打印出:
0Red
12Blue
34Yellow
五6Light
7green
8
简短的回答: soup.findAll(文= TRUE)
这已经回答了,这里计算器,并在<一个href=\"http://www.crummy.com/software/BeautifulSoup/documentation.html#Advanced%20Topics\">BeautifulSoup文档。
更新:
要澄清一下,一个工作片code的:
&GT;&GT;&GT;的txt =\\
&LT; P&GT;红色和LT; / P&GT;
&LT; P&GT;&LT; I&GT;蓝色&LT; / I&GT;&LT; / P&GT;
&所述p为H.;黄色和下; / P&GT;
&LT; P&GT;灯光下,B&GT;绿色&LT; / B&GT;&LT; / P&GT;
&GT;&GT;&GT;进口BeautifulSoup
&GT;&GT;&GT; BeautifulSoup .__ version__
3.0.7a
&GT;&GT;&GT;汤= BeautifulSoup.BeautifulSoup(TXT)
&GT;&GT;&GT;在soup.findAll('P')的节点:
打印''。加入(node.findAll(文= TRUE))红
蓝色
黄色
葱绿
I'm trying to scrape all the inner html from the <p>
elements in a web page using BeautifulSoup. There are internal tags, but I don't care, I just want to get the internal text.
For example, for:
<p>Red</p>
<p><i>Blue</i></p>
<p>Yellow</p>
<p>Light <b>green</b></p>
How can I extract:
Red
Blue
Yellow
Light green
Neither .string
nor .contents[0]
does what I need. Nor does .extract()
, because I don't want to have to specify the internal tags in advance - I want to deal with any that may occur.
Is there a 'just get the visible HTML' type of method in BeautifulSoup?
----UPDATE------
On advice, trying:
soup = BeautifulSoup(open("test.html"))
p_tags = soup.findAll('p',text=True)
for i, p_tag in enumerate(p_tags):
print str(i) + p_tag
But that doesn't help - it prints out:
0Red
1
2Blue
3
4Yellow
5
6Light
7green
8
Short answer: soup.findAll(text=True)
This has already been answered, here on StackOverflow and in the BeautifulSoup documentation.
UPDATE:
To clarify, a working piece of code:
>>> txt = """\
<p>Red</p>
<p><i>Blue</i></p>
<p>Yellow</p>
<p>Light <b>green</b></p>
"""
>>> import BeautifulSoup
>>> BeautifulSoup.__version__
'3.0.7a'
>>> soup = BeautifulSoup.BeautifulSoup(txt)
>>> for node in soup.findAll('p'):
print ''.join(node.findAll(text=True))
Red
Blue
Yellow
Light green
这篇关于BeautifulSoup:只是得到一个标记中,无论有多少封闭标签有的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!