您如何在< span>特定"class = id"标签类型属性与美丽汤? [英] How do you search within a <span> tag for a specific "class=id" type attribute with Beautiful soup?

查看:108
本文介绍了您如何在< span>特定"class = id"标签类型属性与美丽汤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用BeautifulSoup刮取具有以下常规格式的页面:

I'm trying to scrape a page with BeautifulSoup which has the general format of the following:

<span class="ID1"> TEXT </span>
<span class="ID2"> TEXT2 </span>

这些都存储在<div>中,所以我的常规代码模板如下:

These are all stored in a <div>, so my general code template looks like this:

for tag in soup.find_all('div'):
    print tag.find('span')

这会拉起div中的所有<span>标记,但是我不知道如何在<span>s中进行搜索.我已经尝试过tag.find('class').find('ID')之类的东西,但是没有运气.

This pulls up all <span> tags in the div, but I can't figure out how to search inside of the <span>s. I've tried things like tag.find('class'), and .find('ID'), but no luck.

我可以通过获取对象的字符串表示形式然后测试它是否具有我要查找的ID来手动找到我要查找的东西,但这似乎是一个创可贴的方法.我确定有些东西我只是看不到.

I can manually find the thing I'm looking for by getting this string representation of the object and then testing whether or not it has the ID I'm looking for, but that seems like a band-aid approach. I'm sure there's something that I'm just not seeing.

注意:我还尝试过将ID的正则表达式传递给find函数ala:

Note: I've also tried passing in a regex of the ID to the find function ala:

for tag in soup.find_all('div'):
    print tag.find(re.compile('id2'))

不幸的是,仍然没有运气.

Still no luck unfortunately.

那么,如何搜索特定的类值?

So, how do I search for a specific class value?

我想出了如何通过BeautifulSoup内置的find函数来做到这一点,而无需手动检查其字典结构.

I figured out how to do it via BeautifulSoup's built in find function without manually checking its dictionary structure.

要使用find函数在html标签中选择特定的class=value,请将您要查找的常规标签名称作为第一个参数传入(在我的情况下,它是''的一部分标签).作为第二个参数,传入包含您要查找的特定'class' : 'value'的字典.

To use the find function to pick out a specific class=value within an html tag, pass in the general tag name that you want to find as the first parameter (in my case, it was part of a '' tag). As the second argument, pass in a dictionary with the specific 'class' : 'value' you want to find.

例如,如果我要抓取的HTML如下所示:

For example, if the HTML I want to scrape looks like this:

<div>
    <span class="ID1"> TEXT </span>
    <other HTML junk> 
    <span class="ID2"> TEXT2 </span>
</div>

我可以使用下面的语句.

I can use an statement like the one below.

for tag_elm in soup.find_all('div'):
    print tag_elm.find('span', {'class' : 'ID2'})

多田!

推荐答案

这应该有效:

for tag in soup.findAll('span'):
    if tag.has_key('class'):
        if tag['class'] == 'ID2':
            # do stuff

测试了此代码:

from BeautifulSoup import BeautifulSoup

text = '''
<span class="ID1"> TEXT </span>
<span class="ID2"> TEXT2 </span>
'''

soup = BeautifulSoup(text)

for tag in soup.findAll('span'):
    if tag.has_key('class'):
        if tag['class'] == 'ID2':
            print tag.string
            break

给出以下输出:

TEXT2 

这篇关于您如何在&lt; span&gt;特定"class = id"标签类型属性与美丽汤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆