Python,漂亮的汤,获得所有类名 [英] Python, beautiful soup, get all class name

查看:79
本文介绍了Python,漂亮的汤,获得所有类名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个html代码,可以说:

given an html code lets say:

 <div class="class1">
    <span class="class2">some text</span>
    <span class="class3">some text</span>
    <span class="class4">some text</span>
    </div>

如何检索所有的类名?即:['class1','class2','class3','class4']

How can I retrieve all the class names? ie: ['class1','class2','class3','class4']

我尝试过:

soup.find_all(class_=True)

但是它会检索整个标签,然后我需要对字符串做一些正则表达式

But it retrieves the whole tag and i then need to do some regex on the string

推荐答案

您可以在检索属性时,对作为 dictionary 找到的每个Tag实例进行处理.请注意,由于class是特殊的:

You can treat each Tag instance found as a dictionary when it comes to retrieving attributes. Note that class attribute value would be a list since class is a special "multi-valued" attribute:

classes = []
for element in soup.find_all(class_=True):
    classes.extend(element["class"])

或者:

classes = [value 
           for element in soup.find_all(class_=True) 
           for value in element["class"]]

演示:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """
   ...: <div class="class1">
   ...:     <span class="class2">some text</span>
   ...:     <span class="class3">some text</span>
   ...:     <span class="class4">some text</span>
   ...: </div>"""

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: classes = [value
   ...:            for element in soup.find_all(class_=True)
   ...:            for value in element["class"]]

In [5]: print(classes)
['class1', 'class2', 'class3', 'class4']

这篇关于Python,漂亮的汤,获得所有类名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆