获取HTML代码的结构 [英] Get a structure of HTML code

查看：82 发布时间：2018/6/19 21:56:56 python html beautifulsoup

本文介绍了获取HTML代码的结构的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用的是BeautifulSoup4，我很好奇是否有一个函数返回HTML代码的结构（有序标签）。

以下是一个例子：

 < html> 
< body> 
< h1>简单示例< / h1> 
< p>这是一个简单的html页面示例< / p> 
< / body> 
< / html>

print page.structure（）：

 >> 
< html> 
< body> 
< h1>< / h1> 
< p>< / p> 
< / body> 
< / html>

我试图找到一个解决方案，但没有成功。

Thanks
解决方案
b
$ b
def taggify（soup）：用于汤中的标记：如果isinstance（tag，bs4.Tag）： yield'< {}> {}< / {}>'。format（tag.name，''。join（taggify（tag）），tag.name） $ b示例： $ p $ html =' '' < HTML> < body> < h1>简单示例< / h1> < p>这是一个简单的html页面示例< / p> < / body> < / html>''' 汤= BeautifulSoup（html） ''.join（taggify（soup）） Out [ < / gt>< body>< h1>< / h1>< p>< / p>< / body>< / html>'
I'm using BeautifulSoup4 and I'm curious whether is there a function which returns a structure (ordered tags) of the HTML code.
Here is an example: <html> <body> <h1>Simple example</h1> <p>This is a simple example of html page</p> </body> </html> print page.structure(): >> <html> <body> <h1></h1> <p></p> </body> </html> I tried to find a solution but no success. Thanks 解决方案 There is not, to my knowledge, but a little recursion should work: def taggify(soup): for tag in soup: if isinstance(tag, bs4.Tag): yield '<{}>{}</{}>'.format(tag.name,''.join(taggify(tag)),tag.name) demo: html = '''<html> <body> <h1>Simple example</h1> <p>This is a simple example of html page</p> </body> </html>''' soup = BeautifulSoup(html) ''.join(taggify(soup)) Out[34]: '<html><body><h1></h1><p></p></body></html>' 这篇关于获取HTML代码的结构的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取HTML代码的结构 [英] Get a structure of HTML code

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

获取HTML代码的结构 [英] Get a structure of HTML code

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭