使用Python的request.get()之后损坏的PDF文件 [英] Corrupted PDF file after requests.get() with Python

查看:1202
本文介绍了使用Python的request.get()之后损坏的PDF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用request.get()下载PDF文件.它适用于我找到的大多数测试PDF文件,但在这种情况下,它无效并且文件已损坏.如果我使用浏览器打开URL并保存文件,则可以正常工作.我曾尝试使用流"将其分批下载,但结果相同.你能告诉我我想念什么吗?

I am trying to download a PDF file using requests.get(). It works for most test PDF files I found but for this case it does not and the file is corrupted. If I open the URL with a browser and save the file it is working just fine. I have tried to download it in chunks using 'Stream' but with the same result. Could you please explain to me what am I missing?

import requests

file_url = 'http://medianet.edmond-de-rothschild.fr/edram/pdf/kiid_fr0010172767_en_20200120_20200128_1954.pdf'


headers = {'Content-type': 'application/pdf'}
r = requests.get(file_url, headers=headers)

with open("python.pdf", "wb") as pdf:
    pdf.write(r.content)
    pdf.close()

推荐答案

修复 header 信息即可使其正常工作.

Fixing the header information makes it work.

import requests

file_url = "http://medianet.edmond-de-rothschild.fr/edram/pdf/kiid_fr0010172767_en_20200120_20200128_1954.pdf"

headers = {
    "User-Agent": "PostmanRuntime/7.20.1",
    "Accept": "*/*",
    "Cache-Control": "no-cache",
    "Postman-Token": "8eb5df70-4da6-4ba1-a9dd-e68880316cd9,30ac79fa-969b-4a24-8035-26ad1a2650e1",
    "Host": "medianet.edmond-de-rothschild.fr",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "keep-alive",
    "cache-control": "no-cache",
}

r = requests.get(file_url, file_url, headers=headers)

with open("python.pdf", "wb") as pdf:
    pdf.write(r.content)

这篇关于使用Python的request.get()之后损坏的PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆