BeautifulSoup计数标签里面没有他们深解析 [英] BeautifulSoup counting tags without parsing deep inside them

查看：221 发布时间：2016/8/5 18:59:04 python xml xml-parsing beautifulsoup

本文介绍了BeautifulSoup计数标签里面没有他们深解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想到了以下<一个href=\"http://stackoverflow.com/questions/27673349/python-xml-parsing-algorithm-speed/27673558#27673558\">while写一个答案。

假设我有一个深度嵌套 XML 文件中像这样（但更嵌套和更长的时间）：

 ＆lt;节名称=1＆GT;
    ＆LT;节名称为foo＆GT;
        ＆LT; subsubsection NAME =酒吧＆GT;
            ＆LT;更深层次的名字=哎＆GT;
                ＆LT; much_deeper名哟＆GT;
                    ＆LT;立GT;有些内容与LT; /李＆GT;
                ＆LT; / much_deeper＆GT;
            ＆LT; /更深＆GT;
        ＆LT; / subsubsection＆GT;
    ＆LT; /款中，GT;
＆LT; /节＆gt;
＆lt;节名称=2＆GT;
    ...等等
＆LT; /节＆gt;

与问题LEN（soup.find_all（部分））是在做 find_all（部分），BS不断深进搜索，我知道不会包含任何其他部分的代码标记。

于是，两个问题：

有没有一种方法，使BS的不可以递归搜索到一个已经发现标签？

如果答案1是肯定的，这将是更有效的或者是相同的内部流程？

解决方案

BeautifulSoup 不能给你只是它发现标签的计数/数量。

你什么，不过，可以改善的是：不要让 BeautifulSoup 去其他章节内搜索部分通过传递递归=假：

  LEN（soup.find_all（小节，递归= FALSE））

除此之外的改进， LXML 将做的工作速度快：

  tree.xpath（'计数（//部分））

I thought about the following while writing an answer to this question.

Suppose I have a deeply nested xml file like this (but much more nested and much longer):

<section name="1">
    <subsection name"foo">
        <subsubsection name="bar">
            <deeper name="hey">
                <much_deeper name"yo">
                    <li>Some content</li>
                </much_deeper>
            </deeper>
        </subsubsection>
    </subsection>
</section>
<section name="2">
    ... and so forth
</section>

The problem with len(soup.find_all("section")) is that while doing find_all("section"), BS keeps searching deep into a tag that I know won't contain any other section tag.

So, two questions:

Is there a way to make BS not search recursively into an already found tag?
If the answer to 1 is yes, will it be more efficient or is it the same internal process?

解决方案

BeautifulSoup cannot give you just a count/number of tags it found.

What you, though, can improve is: don't let BeautifulSoup go searching sections inside other sections by passing recursive=False:

len(soup.find_all("section", recursive=False))

Aside from that improvement, lxml would do the job faster:

tree.xpath('count(//section)')

这篇关于BeautifulSoup计数标签里面没有他们深解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup计数标签里面没有他们深解析 [英] BeautifulSoup counting tags without parsing deep inside them

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup计数标签里面没有他们深解析 [英] BeautifulSoup counting tags without parsing deep inside them

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭