使用 BeautifulSoup 根据属性提取图像 src [英] Extracting image src based on attribute with BeautifulSoup

查看:33
本文介绍了使用 BeautifulSoup 根据属性提取图像 src的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 BeautifulSoup 从 IMDb 获取 HTML 页面,我想从页面中提取海报图像.我已经获得了基于其中一个属性的图像,但我不知道如何提取其中的数据.

I'm using BeautifulSoup to get a HTML page from IMDb, and I would like to extract the poster image from the page. I've got the image based on one of the attributes, but I don't know how to extract the data inside it.

这是我的代码:

url = 'http://www.imdb.com/title/tt%s/' % (id)
soup = BeautifulSoup(urllib2.urlopen(url).read())
print("before FOR")
for src in soup.find(itemprop="image"): 
    print("inside FOR")
    print(link.get('src'))

推荐答案

大功告成 - 只是几个错误.soup.find() 获取匹配的第一个元素,而不是列表,因此您无需对其进行迭代.获得元素后,您可以使用字典访问获取其属性(如 src).这是一个重新设计的版本:

You're almost there - just a couple of mistakes. soup.find() gets the first element that matches, not a list, so you don't need to iterate over it. Once you have got the element, you can get its attributes (like src) using dictionary access. Here's a reworked version:

film_id = '0423409'
url = 'http://www.imdb.com/title/tt%s/' % (film_id)
soup = BeautifulSoup(urllib2.urlopen(url).read())
link = soup.find(itemprop="image")
print(link["src"])
# output:
http://ia.media-imdb.com/images/M/MV5BMTg2ODMwNTY3NV5BMl5BanBnXkFtZTcwMzczNjEzMQ@@._V1_SY317_CR0,0,214,317_.jpg

我已将 id 更改为 film_id,因为 id() 是一个内置函数,屏蔽它们是不好的做法.

I've changed id to film_id, because id() is a built-in function, and it's bad practice to mask those.

这篇关于使用 BeautifulSoup 根据属性提取图像 src的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆