使用BeautifulSoup下载图像 [英] Images download with BeautifulSoup
问题描述
我正在使用BeautifulSoup提取图片,该图片在普通页面上效果很好. 现在,我想从这样的网页中提取Chromebook的图片
I am using BeautifulSoup for extracting pictures which works well for normal pages. Now I want to extract the picture of the Chromebook from a web page like this
https://twitter.com/banprada/statuses/829102430017187841
该页面显然包含带有图像的另一个页面的链接.这是我从提到的链接下载图像的代码,但我只得到发布该链接的人的图像.
The page apparently contains a link to another page with the image. Here is my code for downloading an image from mentioned link but I am only getting the image of the person who posted the link.
import urllib.request
import os
from bs4 import BeautifulSoup
URL = "http://twitter.com/banprada/statuses/829102430017187841"
list_dir="D:\\"
default_dir = os.path.join(list_dir,"Pictures_neu")
opener = urllib.request.build_opener()
urllib.request.install_opener(opener)
soup = BeautifulSoup(urllib.request.urlopen(URL).read())
imgs = soup.findAll("img",{"alt":True, "src":True})
for img in imgs:
img_url = img["src"]
filename = os.path.join(default_dir, img_url.split("/")[-1])
img_data = opener.open(img_url)
f = open(filename,"wb")
f.write(img_data.read())
f.close()
是否有机会以某种方式下载图像?
Is there an opportunity to download the image somehow?
非常感谢和问候, 安迪
Many thanks and regards, Andi
推荐答案
这是使用请求
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import requests
link = 'https://twitter.com/banprada/statuses/829102430017187841'
driver = webdriver.PhantomJS()
driver.get(link)
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[starts-with(@id, 'xdm_default')]")))
image_src = driver.find_element_by_tag_name('img').get_attribute('src')
response = requests.get(image_src).content
with open('C:\\Users\\You\\Desktop\\Image.jpeg', 'wb') as f:
f.write(response)
如果要从页面上的所有 iframe中获取全部的图像(不包括可以从代码中获得的初始页面源中的图像):
If you want to get all the images from all iframes on page (excluding images on initial page source that you can get with your code):
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
import requests
import time
link = 'https://twitter.com/banprada/statuses/829102430017187841'
driver = webdriver.Chrome()
driver.get(link)
time.sleep(5) # To wait until all iframes completely rendered. Might be increased
iframe_counter = 0
while True:
try:
driver.switch_to_frame(iframe_counter)
pictures = driver.find_elements_by_xpath('//img[@src and @alt]')
if len(pictures) > 0:
for pic in pictures:
response = requests.get(pic.get_attribute('src')).content
with open('C:\\Users\\You\\Desktop\\Images\\%s.jpeg' % (str(iframe_counter) + str(pictures.index(pic))), 'wb') as f:
f.write(response)
driver.switch_to_default_content()
iframe_counter += 1
except WebDriverException:
break
请注意,您可以使用任何webdriver
Note, that you can use any webdriver
这篇关于使用BeautifulSoup下载图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!