使用 Python 从 url 下载图像时出现问题 [英] Issue when downloading image from url with Python

查看:32
本文介绍了使用 Python 从 url 下载图像时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用请求和shutil 库从带有Python 的URL 下载图像.我的代码如下:

I am trying to download an image from a URL with Python using the requests and shutil libraries. My code is below:

import requests
import shutil

image_url = "https://www.metmuseum.org/-/media/images/visit/met-fifth-avenue/fifthave_teaser.jpg"

with open("image1.jpg", "wb") as file:
    response = requests.get(image_url, stream=True)
    response.raw.decode_content = True
    shutil.copyfileobj(response.raw, file)
file.close()

此代码适用于我尝试过的大多数其他图片网址(例如:https://tinyjpg.com/images/social/website.jpg).但是,对于代码中的 image_url,会创建一个 1kb 的文件,并显示看起来我们不支持此文件格式"的错误.

This code works for most other image urls that I have tried (eg: https://tinyjpg.com/images/social/website.jpg). However, for the image_url in the code, a 1kb file is created with an error that says "It looks like we don't support this file format."

我也试过:

import urllib
urllib.request.urlretrieve(image_url, "image1.jpg)

使用 Seleniumwire 可以做到这一点 - 我使用 driver.requests 来获取站点发出的所有请求的列表,然后遍历这些请求,直到我得到包含文件类型的 request.response.header (.jpg).似乎有两个具有相同 url 的请求(第一个请求的内容类型为text/html",第二个请求的内容类型为image/jpg").

It is possible to do this using Seleniumwire - I used driver.requests to get a list of all requests made by the site, and then looped through these requests until I got a request.response.header that included the file type (.jpg). It appears that there are two requests with the same url (the first with content-type 'text/html' and the second with 'image/jpg').

我想在不加载 WebDriver 的情况下运行它.有什么办法可以使用请求功能下载这样的图像吗?

I would like to run this without loading a WebDriver. Is there any way I can download an image like this using the requests function?

推荐答案

如果你查看 response.text 你会看到服务器不喜欢你的请求头并认为你是一个机器人:

If you view the response.text you'll see that the server doesn't like your request headers and thinks you're a robot:

'<html>\r\n<head>\r\n<META NAME="robots" CONTENT="noindex,nofollow">\r\n<script src="/_Incapsula_Resource?SWJIYLWA=5074a744e2e3d891814e9a2dace20bd4,719d34d31c8e3a6e6fffd425f7e032f3">\r\n</script>\r\n<body>\r\n</body></html>\r\n'

但是,如果您提供正确的 User-Agent 标头,其响应会发生变化,您可以继续保存文件:

But if you provide a proper User-Agent header its response changes and you can proceed with saving the file:

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36'} 

response = requests.get(image_url, stream=True, headers=headers)

with open("image1.jpg", "bw") as file:
    file.write(response.content)

因此,您必须在请求标头中模拟用户代理才能获取此图像.

So you have to mock a user-agent in the request headers to get this image.

另外,with 是一个上下文管理器,它已经为你关闭了文件.

Also, with is a context manager, it already closes the file for you.

这篇关于使用 Python 从 url 下载图像时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆