使用beautifulsoup提取换行符之间的文本（例如小于GT BR /＆;标签） [英] Using beautifulsoup to extract text between line breaks (e.g. <br /> tags)

查看：1223 发布时间：2016/8/5 18:56:56 python html html-parsing beautifulsoup

本文介绍了使用beautifulsoup提取换行符之间的文本（例如小于GT BR /＆;标签）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下的HTML这是一个较大的文档中

I have the following HTML that is within a larger document

<br />
Important Text 1
<br />
<br />
Not Important Text
<br />
Important Text 2
<br />
Important Text 3
<br />
<br />
Non Important Text
<br />
Important Text 4
<br />

我目前使用BeautifulSoup来获得HTML中的其他元素，但我一直没能找到一种方式来获得文字℃之间的重要线路; BR /＆GT; 标签。我可以隔离并导航到每个＆LT的; BR /＆GT; 元素，但不能找到一种方式来获得其中的文本。任何帮助将大大AP preciated。谢谢你。

I'm currently using BeautifulSoup to obtain other elements within the HTML, but I have not been able to find a way to get the important lines of text between <br /> tags. I can isolate and navigate to each of the <br /> elements, but can't find a way to get the text in between. Any help would be greatly appreciated. Thanks.

推荐答案

如果你只是想这是两个℃之间的任何文本; BR /＆GT; 标签，你可以做类似如下：

If you just want any text which is between two <br /> tags, you could do something like the following:

from BeautifulSoup import BeautifulSoup, NavigableString, Tag

input = '''<br />
Important Text 1
<br />
<br />
Not Important Text
<br />
Important Text 2
<br />
Important Text 3
<br />
<br />
Non Important Text
<br />
Important Text 4
<br />'''

soup = BeautifulSoup(input)

for br in soup.findAll('br'):
    next = br.nextSibling
    if not (next and isinstance(next,NavigableString)):
        continue
    next2 = next.nextSibling
    if next2 and isinstance(next2,Tag) and next2.name == 'br':
        text = str(next).strip()
        if text:
            print "Found:", next

但是，也许我误解你的问题？你的问题的说明似乎不相匹配了重要/非重要在您的示例数据，所以我与描述消失了;）

But perhaps I misunderstand your question? Your description of the problem doesn't seem to match up with the "important" / "non important" in your example data, so I've gone with the description ;)

这篇关于使用beautifulsoup提取换行符之间的文本（例如小于GT BR /＆;标签）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用beautifulsoup提取换行符之间的文本（例如小于GT BR /＆;标签） [英] Using beautifulsoup to extract text between line breaks (e.g. <br /> tags)

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用beautifulsoup提取换行符之间的文本（例如小于GT BR /＆;标签） [英] Using beautifulsoup to extract text between line breaks (e.g. &lt;br /&gt; tags)

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

使用beautifulsoup提取换行符之间的文本（例如小于GT BR /＆;标签） [英] Using beautifulsoup to extract text between line breaks (e.g. <br /> tags)

登录关闭