使用BeautifulSoup从`img`标记中提取`src`属性 [英] Extract `src` attribute from `img` tag using BeautifulSoup

查看:866
本文介绍了使用BeautifulSoup从`img`标记中提取`src`属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<div class="someClass">
    <a href="href">
        <img alt="some" src="some"/>
    </a>
</div>

我使用bs4,但不能使用a.attrs['src']来获取src,但是可以获取href.我该怎么办?

I use bs4 and I cannot use a.attrs['src'] to get the src, but I can get href. What should I do?

推荐答案

您可以使用BeautifulSoup提取html img标记的src属性.在我的示例中,htmlText包含img标记本身,但是它也可以与urllib2一起用于URL.

You can use BeautifulSoup to extract src attribute of an html img tag. In my example, the htmlText contains the img tag itself but this can be used for a URL too along with urllib2.

对于网址

from BeautifulSoup import BeautifulSoup as BSHTML
import urllib2
page = urllib2.urlopen('http://www.youtube.com/')
soup = BSHTML(page)
images = soup.findAll('img')
for image in images:
    #print image source
    print image['src']
    #print alternate text
    print image['alt']

用于带有img标签的文本

from BeautifulSoup import BeautifulSoup as BSHTML
htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """
soup = BSHTML(htmlText)
images = soup.findAll('img')
for image in images:
    print image['src']

这篇关于使用BeautifulSoup从`img`标记中提取`src`属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆