深度优先遍历对BeautifulSoup解析树 [英] Depth First Traversal on BeautifulSoup Parse Tree

查看:683
本文介绍了深度优先遍历对BeautifulSoup解析树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法做了DFT在BeautifulSoup解析树?我试图做这样的事情从根开始,通常情况下,得到所有的子元素,然后对每个子元素让自己的孩子,直到等我打在这一点上我将建立我的方式回到了树中的一个终端节点。问题是,我似乎无法找到,让我做这件事的方法。我发现findChildren方法,但似乎多次干脆把整个页面列表中的每个后续项得到降低。我也许能利用这个做不过比它不会出现有任何的方法来确定条目作为终端节点或不在列表中的最后一项以外的穿越。任何想法?

Is there a way to do a DFT on a BeautifulSoup parse tree? I'm trying to do something like starting at the root, usually , get all the child elements and then for each child element get their children, etc until I hit a terminal node at which point I'll build my way back up the tree. Problem is I can't seem to find a method that will allow me to do this. I found the findChildren method but that seems to just put the entire page in a list multiple times with each subsequent entry getting reduced. I might be able to use this to do a traversal however other than the last entry in the list it doesn't appear there is any way to identify entries as terminal nodes or not. Any ideas?

推荐答案

recursiveChildGenerator()已经做的:

soup = BeautifulSoup.BeautifulSoup(html)
for child in soup.recursiveChildGenerator():
     name = getattr(child, "name", None)
     if name is not None:
         print name
     elif not child.isspace(): # leaf node, don't print spaces
         print child

输出

有关从<一的HTML href=\"http://stackoverflow.com/questions/4814317/depth-first-traversal-on-beautifulsoup-parse-tree/4814582#4814582\">@msalvadores's回答:

html
ul
li
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
li
Aliquam tincidunt mauris eu risus.
li
Vestibulum auctor dapibus neque.
html

注: HTML 因<印刷两次href=\"http://stackoverflow.com/questions/4814317/depth-first-traversal-on-beautifulsoup-parse-tree/4814582#4814582\">the例如包含的两个的开幕&LT; HTML方式&gt; 标签

这篇关于深度优先遍历对BeautifulSoup解析树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆