如何使用来自http url的原始数据在python中下载ms word docx文件 [英] How to download ms word docx file in python with raw data from http url

查看:28
本文介绍了如何使用来自http url的原始数据在python中下载ms word docx文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果在浏览器中点击以下网址,将下载 docx 文件,我想用 python 自动下载.

if the following url is hit in browser the docx file will be downloaded i want to automate the download with python.

https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE OFNDIDI 诉联合王国.docx&logEvent=False

我已经尝试了以下

from docx import Document
import requests
import json
from bs4 import BeautifulSoup
dwnurl = 'https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE%20OF%20NDIDI%20v.%20THE%20UNITED%20KINGDOM.docx&logEvent=False'
doc = requests.get(dwnurl)

print(doc.content) #printing the document like b'PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00!\xfb\x16\x01\x16\x02\x00\x00\xec\x0c\x00\x00\x13\x00\xc4\x01[Content_Types].xml \xa2\xc0\

print(doc.raw)  #printing the document like <urllib3.response.HTTPResponse object at 0x063D8BD0>

document = Document(doc.content)
document.save('test.docx')

#on document.save i have facing these issues

回溯(最近一次调用最后一次):文件scraping_hudoc.py",第 40 行,在 <module> 中文档 = 文档(文档.内容)文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\api.py",第 25 行,在文档中document_part = Package.open(docx).main_document_part文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\package.py",第 116 行,打开pkg_reader = PackageReader.from_file(pkg_file)文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\pkgreader.py",第 32 行,在 from_filephys_reader = PhysPkgReader(pkg_file)文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\phys_pkg.py",第 101 行,在 __init__self._zipf = ZipFile(pkg_file, 'r')文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py",第 1108 行,在 __init__ 中self._RealGetContents()文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py",第 1171 行,在 _RealGetContentsendrec = _EndRecData(fp)文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py",第 241 行,在 _EndRecDatafpin.seek(0, 2)AttributeError: 'bytes' 对象没有属性 'seek'

推荐答案

我已经通过这个保存了ms word docx文件

i have saved the ms word docx file through this

import requests
def save_link(book_link, book_name):
    the_book = requests.get(book_link, stream=True)
    with open(book_name, 'wb') as f:
      for chunk in the_book.iter_content(1024 * 1024 * 2):  # 2 MB chunks
        f.write(chunk)

save_link("https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE%20OF%20NDIDI%20v.%20THE%20UNITED%20KINGDOM.docx&logEvent=False","CASE OF NDIDI v. THE UNITED KINGDOM.docx")

这篇关于如何使用来自http url的原始数据在python中下载ms word docx文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆