BeautifulSoup：只是得到一个标记中，无论有多少封闭标签有 [英] BeautifulSoup: just get inside of a tag, no matter how many enclosing tags there are

查看：294 发布时间：2016/8/5 18:54:07 python beautifulsoup

本文介绍了BeautifulSoup：只是得到一个标记中，无论有多少封闭标签有的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图刮掉从＆LT所有的内部HTML; P＆gt;使用BeautifulSoup在网页元素。有内部的标签，但我不在乎，我只想让内部文本。

例如，对于

 ＆LT; P＆GT;红色和LT; / P＆GT;
＆LT; P＆GT;＆LT; I＆GT;蓝色＆LT; / I＆GT;＆LT; / P＆GT;
＆所述p为H.;黄色和下; / P＆GT;
＆LT; P＆GT;灯光下，B＆GT;绿色＆LT; / B＆GT;＆LT; / P＆GT;

我如何可以提取：

 红
蓝色
黄色
葱绿

无论是 .string 也不 .contents [0] 做什么，我需要的。也不 .extract（），因为我不希望有预先指定的内部变量 - 我想，以应付任何可能发生的

有没有刚拿到可见HTML'类型的方法在BeautifulSoup？

---- ------更新

在咨询，想：

 汤= BeautifulSoup（开放（test.html的））
p_tags = soup.findAll（'P'，文本= TRUE）
对于我，P_TAG在历数（p_tags）：
    打印STR（I）+ P_TAG

但是，这并不能帮助 - 它打印出：

  0Red
12Blue
34Yellow
五6Light
7green
8

解决方案

简短的回答： soup.findAll（文= TRUE）

这已经回答了，这里计算器，并在<一个href=\"http://www.crummy.com/software/BeautifulSoup/documentation.html#Advanced%20Topics\">BeautifulSoup文档。

更新：

要澄清一下，一个工作片code的：

 ＆GT;＆GT;＆GT;的txt =\\
＆LT; P＆GT;红色和LT; / P＆GT;
＆LT; P＆GT;＆LT; I＆GT;蓝色＆LT; / I＆GT;＆LT; / P＆GT;
＆所述p为H.;黄色和下; / P＆GT;
＆LT; P＆GT;灯光下，B＆GT;绿色＆LT; / B＆GT;＆LT; / P＆GT;

＆GT;＆GT;＆GT;进口BeautifulSoup
＆GT;＆GT;＆GT; BeautifulSoup .__ version__
3.0.7a
＆GT;＆GT;＆GT;汤= BeautifulSoup.BeautifulSoup（TXT）
＆GT;＆GT;＆GT;在soup.findAll（'P'）的节点：
    打印''。加入（node.findAll（文= TRUE））红
蓝色
黄色
葱绿

I'm trying to scrape all the inner html from the <p> elements in a web page using BeautifulSoup. There are internal tags, but I don't care, I just want to get the internal text.

For example, for:

<p>Red</p>
<p><i>Blue</i></p>
<p>Yellow</p>
<p>Light <b>green</b></p>

How can I extract:

Red
Blue
Yellow
Light green

Neither .string nor .contents[0] does what I need. Nor does .extract(), because I don't want to have to specify the internal tags in advance - I want to deal with any that may occur.

Is there a 'just get the visible HTML' type of method in BeautifulSoup?

----UPDATE------

On advice, trying:

soup = BeautifulSoup(open("test.html"))
p_tags = soup.findAll('p',text=True)
for i, p_tag in enumerate(p_tags): 
    print str(i) + p_tag

But that doesn't help - it prints out:

0Red
1

2Blue
3

4Yellow
5

6Light 
7green
8

解决方案

Short answer: soup.findAll(text=True)

This has already been answered, here on StackOverflow and in the BeautifulSoup documentation.

UPDATE:

To clarify, a working piece of code:

>>> txt = """\
<p>Red</p>
<p><i>Blue</i></p>
<p>Yellow</p>
<p>Light <b>green</b></p>
"""
>>> import BeautifulSoup
>>> BeautifulSoup.__version__
'3.0.7a'
>>> soup = BeautifulSoup.BeautifulSoup(txt)
>>> for node in soup.findAll('p'):
    print ''.join(node.findAll(text=True))

Red
Blue
Yellow
Light green

这篇关于BeautifulSoup：只是得到一个标记中，无论有多少封闭标签有的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup：只是得到一个标记中，无论有多少封闭标签有 [英] BeautifulSoup: just get inside of a tag, no matter how many enclosing tags there are

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup：只是得到一个标记中，无论有多少封闭标签有 [英] BeautifulSoup: just get inside of a tag, no matter how many enclosing tags there are

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭