提取图片src基于与BeautifulSoup属性 [英] Extracting image src based on attribute with BeautifulSoup
问题描述
我使用BeautifulSoup从IMDB得到一个HTML页面,我想提取网页中的海报图像。我有根据的属性之一的形象,但我不知道如何提取里面的数据。
下面是我的code:
URL ='http://www.imdb.com/title/tt%s/'%(ID)
汤= BeautifulSoup(urllib2.urlopen(URL).read())
打印(FOR之前)
在soup.find SRC(itemprop =图像):
打印(里面)
打印(link.get(SRC))
您是几乎没有 - 只是一对夫妇的错误。 soup.find()
获取匹配,而不是一个清单,这样你就不会需要遍历它的第一个元素。一旦你得到了元素,你可以使用字典访问它的属性(如的src
)。这里有一个重新设计的版本:
film_id ='0423409'
URL ='http://www.imdb.com/title/tt%s/'%(film_id)
汤= BeautifulSoup(urllib2.urlopen(URL).read())
链接= soup.find(itemprop =图像)
打印(链接[SRC])
#输出:
http://ia.media-imdb.com/images/M/MV5BMTg2ODMwNTY3NV5BMl5BanBnXkFtZTcwMzczNjEzMQ@@._V1_SY317_CR0,0,214,317_.jpg
我已经改变了 ID
到 film_id
,因为 ID()
是一个内置的功能,它是不好的做法来掩盖这些。
I'm using BeautifulSoup to get a HTML page from IMDb, and I would like to extract the poster image from the page. I've got the image based on one of the attributes, but I don't know how to extract the data inside it.
Here's my code:
url = 'http://www.imdb.com/title/tt%s/' % (id)
soup = BeautifulSoup(urllib2.urlopen(url).read())
print("before FOR")
for src in soup.find(itemprop="image"):
print("inside FOR")
print(link.get('src'))
You're almost there - just a couple of mistakes. soup.find()
gets the first element that matches, not a list, so you don't need to iterate over it. Once you have got the element, you can get its attributes (like src
) using dictionary access. Here's a reworked version:
film_id = '0423409'
url = 'http://www.imdb.com/title/tt%s/' % (film_id)
soup = BeautifulSoup(urllib2.urlopen(url).read())
link = soup.find(itemprop="image")
print(link["src"])
# output:
http://ia.media-imdb.com/images/M/MV5BMTg2ODMwNTY3NV5BMl5BanBnXkFtZTcwMzczNjEzMQ@@._V1_SY317_CR0,0,214,317_.jpg
I've changed id
to film_id
, because id()
is a built-in function, and it's bad practice to mask those.
这篇关于提取图片src基于与BeautifulSoup属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!