使用BeautifulSoup下载图像 [英] Images download with BeautifulSoup

查看:72
本文介绍了使用BeautifulSoup下载图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用BeautifulSoup提取图片,该图片在普通页面上效果很好. 现在,我想从这样的网页中提取Chromebook的图片

I am using BeautifulSoup for extracting pictures which works well for normal pages. Now I want to extract the picture of the Chromebook from a web page like this

https://twitter.com/banprada/statuses/829102430017187841

该页面显然包含带有图像的另一个页面的链接.这是我从提到的链接下载图像的代码,但我只得到发布该链接的人的图像.

The page apparently contains a link to another page with the image. Here is my code for downloading an image from mentioned link but I am only getting the image of the person who posted the link.

import urllib.request
import os
from bs4 import BeautifulSoup

URL = "http://twitter.com/banprada/statuses/829102430017187841"
list_dir="D:\\"
default_dir = os.path.join(list_dir,"Pictures_neu")
opener = urllib.request.build_opener()
urllib.request.install_opener(opener)
soup = BeautifulSoup(urllib.request.urlopen(URL).read())
imgs = soup.findAll("img",{"alt":True, "src":True})
for img in imgs:
   img_url = img["src"]
   filename = os.path.join(default_dir, img_url.split("/")[-1])
   img_data = opener.open(img_url)
   f = open(filename,"wb")
   f.write(img_data.read())
   f.close()

是否有机会以某种方式下载图像?

Is there an opportunity to download the image somehow?

非常感谢和问候, 安迪

Many thanks and regards, Andi

推荐答案

这是使用请求

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import requests

link = 'https://twitter.com/banprada/statuses/829102430017187841'
driver = webdriver.PhantomJS()
driver.get(link)
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[starts-with(@id, 'xdm_default')]")))
image_src = driver.find_element_by_tag_name('img').get_attribute('src')
response = requests.get(image_src).content
with open('C:\\Users\\You\\Desktop\\Image.jpeg', 'wb') as f:
    f.write(response)

如果要从页面上的所有 iframe中获取全部的图像(不包括可以从代码中获得的初始页面源中的图像):

If you want to get all the images from all iframes on page (excluding images on initial page source that you can get with your code):

from selenium import webdriver
from selenium.common.exceptions import WebDriverException
import requests
import time

link = 'https://twitter.com/banprada/statuses/829102430017187841'
driver = webdriver.Chrome()
driver.get(link)
time.sleep(5) # To wait until all iframes completely rendered. Might be increased
iframe_counter = 0
while True:
    try:
        driver.switch_to_frame(iframe_counter)
        pictures = driver.find_elements_by_xpath('//img[@src and @alt]')
        if len(pictures) > 0:
            for pic in pictures:
                response = requests.get(pic.get_attribute('src')).content
                with open('C:\\Users\\You\\Desktop\\Images\\%s.jpeg' % (str(iframe_counter) + str(pictures.index(pic))), 'wb') as f:
                    f.write(response)
        driver.switch_to_default_content()
        iframe_counter += 1
    except WebDriverException:
        break

请注意,您可以使用任何webdriver

Note, that you can use any webdriver

这篇关于使用BeautifulSoup下载图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆