使用BeautifulSoup获得最大的标签嵌套 [英] Get maximum nesting of tags with BeautifulSoup

查看:47
本文介绍了使用BeautifulSoup获得最大的标签嵌套的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个BeautifulSoup解析的文档,我正在寻找一种方法来查找最大嵌套级别.

I'm looking for a way, given a BeautifulSoup-parsed document, to find what the maximum level of nesting is.

例如我需要 magic_function 在:

r = requests.get("http//example.com")
soup = BeautifulSoup(r.text)
depth = magic_function(soup)

例如,对于此文档,这将返回4:

Which, for, e.g., this document, would return 4:

<html>
    <body>
        <p>
            <strong>Some Text.</strong>
            <strong>Some Text.</strong>
            <strong>Some Text.</strong>
        </p>
        <p>
            <strong>Some Text.</strong>
            <strong>Some Text.</strong>
            <strong>Some Text.</strong>
        </p>
    </body>
</html>

我有一些想法:

  1. BeautifulSoup中是否有执行此操作的功能?看着文档和谷歌搜索对我没有任何帮助.

  1. Is there a function in BeautifulSoup to do this? Looking at docs and Googling has availed me nothing.

是否存在另一个允许我执行此操作的库?再次,谷歌搜索对我没有任何帮助,但我可能根本不知道要搜索什么.

Is there another library that would allow me to do this? Again, Googling has availed me nothing, but I may simply not know what to search for.

我是否应该尝试使用自己构建的函数遍历树?我确实宁愿不这样做,但我当然可以做到这一点.

Should I try just traversing the tree with a function I've built on my own? I'd really rather not, but I could certainly do that.

推荐答案

使用您自己的 magic_function()遍历树并不难.您可以使用简单的递归函数,例如:

Traversing the tree with your own magic_function() isn't difficult. You could use a simple recursive function like:

def magic_function(soup):
    if hasattr(soup, "contents") and soup.contents:
        return max([magic_function(child) for child in soup.contents]) + 1
    else:
        return 0

您可能希望使用文档的顶级 html 标记调用该函数,以便它不将 soup 对象中的嵌套视为嵌套级别.

You would want to call the function using the document's top-level html tag so that it doesn't count the nesting within the soup object as a nesting level.

使用上面的文档结构,此函数调用返回 4 :

Using your above document structure, this function call returns 4:

>>> magic_function(soup.html)
4

这篇关于使用BeautifulSoup获得最大的标签嵌套的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆