提取图片src基于与BeautifulSoup属性 [英] Extracting image src based on attribute with BeautifulSoup

查看：574 发布时间：2016/8/5 18:58:38 python html-parsing web-scraping beautifulsoup

本文介绍了提取图片src基于与BeautifulSoup属性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用BeautifulSoup从IMDB得到一个HTML页面，我想提取网页中的海报图像。我有根据的属性之一的形象，但我不知道如何提取里面的数据。

下面是我的code：

  URL ='http://www.imdb.com/title/tt%s/'％（ID）
汤= BeautifulSoup（urllib2.urlopen（URL）.read（））
打印（FOR之前）
在soup.find SRC（itemprop =图像）：
    打印（里面）
    打印（link.get（SRC））

解决方案

您是几乎没有 - 只是一对夫妇的错误。 soup.find（）获取匹配，而不是一个清单，这样你就不会需要遍历它的第一个元素。一旦你得到了元素，你可以使用字典访问它的属性（如的src ）。这里有一个重新设计的版本：

  film_id ='0423409'
URL ='http://www.imdb.com/title/tt%s/'％（film_id）
汤= BeautifulSoup（urllib2.urlopen（URL）.read（））
链接= soup.find（itemprop =图像）
打印（链接[SRC]）
＃输出：
http://ia.media-imdb.com/images/M/MV5BMTg2ODMwNTY3NV5BMl5BanBnXkFtZTcwMzczNjEzMQ@@._V1_SY317_CR0,0,214,317_.jpg

我已经改变了 ID 到 film_id ，因为 ID（） 是一个内置的功能，它是不好的做法来掩盖这些。

I'm using BeautifulSoup to get a HTML page from IMDb, and I would like to extract the poster image from the page. I've got the image based on one of the attributes, but I don't know how to extract the data inside it.

Here's my code:

url = 'http://www.imdb.com/title/tt%s/' % (id)
soup = BeautifulSoup(urllib2.urlopen(url).read())
print("before FOR")
for src in soup.find(itemprop="image"): 
    print("inside FOR")
    print(link.get('src'))

解决方案

You're almost there - just a couple of mistakes. soup.find() gets the first element that matches, not a list, so you don't need to iterate over it. Once you have got the element, you can get its attributes (like src) using dictionary access. Here's a reworked version:

film_id = '0423409'
url = 'http://www.imdb.com/title/tt%s/' % (film_id)
soup = BeautifulSoup(urllib2.urlopen(url).read())
link = soup.find(itemprop="image")
print(link["src"])
# output:
http://ia.media-imdb.com/images/M/MV5BMTg2ODMwNTY3NV5BMl5BanBnXkFtZTcwMzczNjEzMQ@@._V1_SY317_CR0,0,214,317_.jpg

I've changed id to film_id, because id() is a built-in function, and it's bad practice to mask those.

这篇关于提取图片src基于与BeautifulSoup属性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

提取图片src基于与BeautifulSoup属性 [英] Extracting image src based on attribute with BeautifulSoup

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

提取图片src基于与BeautifulSoup属性 [英] Extracting image src based on attribute with BeautifulSoup

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭