用美丽的汤来获取所有的HTML标签 [英] Get all HTML tags with Beautiful Soup
本文介绍了用美丽的汤来获取所有的HTML标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图从美丽的汤中得到所有html标签的列表。
我发现所有,但我必须在我搜索之前知道标签的名称。
如果有像
html = < div>东西< / div>
< div>其他< / div>
< div class ='magical'> hi there< / div>
< p> ; ok< / p>
如何获取像
b$ b
list_of_tags = [< div>,< div>,< div class ='magical'>, < p>]
我知道如何用正则表达式来做这件事,但我正在努力学习BS4
解决方案
您不必为 find_all()
- 在这种情况下, BeautifulSoup
会递归地找到树中的所有标签。示例:
>>> from bs4 import BeautifulSoup
>>>
>>> html =< div>东西< / div>
...< div>其他< / div>
...< div class ='magical'> hi there< ; / div>
...< p> ok< / p>
>>>汤= BeautifulSoup(html,html.parser)
>>> [tag.name for soup.find_all()]
[u'div',u'div',u'div',u'p']
>>> [div]>
>中的[str(tag)for tag in soup.find_all()]
['< div> something< / div>','< div> magical> hi there< / div>','< p> ok< / p>']
I am trying to get a list of all html tags from beautiful soup.
I see find all but I have to know the name of the tag before I search.
If there is text like
html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>"""
How would I get a list like
list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]
I know how to do this with regex, but am trying to learn BS4
解决方案
You don't have to specify any arguments to find_all()
- in this case, BeautifulSoup
would find you every tag in the tree, recursively. Sample:
>>> from bs4 import BeautifulSoup
>>>
>>> html = """<div>something</div>
... <div>something else</div>
... <div class='magical'>hi there</div>
... <p>ok</p>"""
>>> soup = BeautifulSoup(html, "html.parser")
>>> [tag.name for tag in soup.find_all()]
[u'div', u'div', u'div', u'p']
>>> [str(tag) for tag in soup.find_all()]
['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']
这篇关于用美丽的汤来获取所有的HTML标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文