通过与BeautifulSoup / Python中的DOM遍历 [英] Iterating through a DOM with BeautifulSoup/Python

查看：509 发布时间：2016/8/5 19:13:51 python html parsing html-parsing beautifulsoup

本文介绍了通过与BeautifulSoup / Python中的DOM遍历的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这样的DOM：

<h2>Main Section</h2>
<p>Bla bla bla<p>
<h3>Subsection</h3>
<p>Some more info</p>

<h3>Subsection 2</h3>
<p>Even more info!</p>


<h2>Main Section 2</h2>
<p>bla</p>
<h3>Subsection</h3>
<p>Some more info</p>

<h3>Subsection 2</h3>
<p>Even more info!</p>

我想生成返回主科，唧唧歪歪，分段等一个迭代器有没有办法将它与BeautifulSoup？

I'd like to generate an iterator that returns 'Main Section', 'Bla bla bla', 'Subsection', etc. Is there a way to this with BeautifulSoup?

推荐答案

下面是做到这一点的方法之一。我们的想法是遍历主要部分（ H2 标签），并为每个 H2 标记迭代的兄弟姐妹，直到明年 H2 标签：

Here's one way to do it. The idea is to iterate over main sections (h2 tag) and for every h2 tag iterate over siblings until next h2 tag:

from bs4 import BeautifulSoup, Tag


data = """<h2>Main Section</h2>
<p>Bla bla bla<p>
<h3>Subsection</h3>
<p>Some more info</p>

<h3>Subsection 2</h3>
<p>Even more info!</p>


<h2>Main Section 2</h2>
<p>bla</p>
<h3>Subsection</h3>
<p>Some more info</p>

<h3>Subsection 2</h3>
<p>Even more info!</p>"""


soup = BeautifulSoup(data)
for main_section in soup.find_all('h2'):
    for sibling in main_section.next_siblings:
        if not isinstance(sibling, Tag):
            continue
        if sibling.name == 'h2':
            break
        print sibling.text
    print "-------"

打印：

Bla bla bla


Subsection
Some more info
Subsection 2
Even more info!
-------
bla
Subsection
Some more info
Subsection 2
Even more info!
-------

希望有所帮助。

这篇关于通过与BeautifulSoup / Python中的DOM遍历的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过与BeautifulSoup / Python中的DOM遍历 [英] Iterating through a DOM with BeautifulSoup/Python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

通过与BeautifulSoup / Python中的DOM遍历 [英] Iterating through a DOM with BeautifulSoup/Python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭