如何使用来自http url的原始数据在python中下载ms word docx文件 [英] How to download ms word docx file in python with raw data from http url

查看：28 发布时间：2021/9/24 18:43:56 python web-scraping

本文介绍了如何使用来自http url的原始数据在python中下载ms word docx文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果在浏览器中点击以下网址，将下载 docx 文件，我想用 python 自动下载.

if the following url is hit in browser the docx file will be downloaded i want to automate the download with python.

https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE OFNDIDI 诉联合王国.docx&logEvent=False

我已经尝试了以下

from docx import Document
import requests
import json
from bs4 import BeautifulSoup
dwnurl = 'https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE%20OF%20NDIDI%20v.%20THE%20UNITED%20KINGDOM.docx&logEvent=False'
doc = requests.get(dwnurl)

print(doc.content) #printing the document like b'PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00!\xfb\x16\x01\x16\x02\x00\x00\xec\x0c\x00\x00\x13\x00\xc4\x01[Content_Types].xml \xa2\xc0\

print(doc.raw)  #printing the document like <urllib3.response.HTTPResponse object at 0x063D8BD0>

document = Document(doc.content)
document.save('test.docx')

#on document.save i have facing these issues

回溯(最近一次调用最后一次):文件scraping_hudoc.py"，第 40 行，在 <module> 中文档 = 文档(文档.内容)文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\api.py"，第 25 行，在文档中document_part = Package.open(docx).main_document_part文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\package.py"，第 116 行，打开pkg_reader = PackageReader.from_file(pkg_file)文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\pkgreader.py"，第 32 行，在 from_filephys_reader = PhysPkgReader(pkg_file)文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\site-packages\docx\opc\phys_pkg.py"，第 101 行，在 __init__self._zipf = ZipFile(pkg_file, 'r')文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py"，第 1108 行，在 __init__ 中self._RealGetContents()文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py"，第 1171 行，在 _RealGetContentsendrec = _EndRecData(fp)文件C:\Users\204387\AppData\Local\Programs\Python\Python36-32\lib\zipfile.py"，第 241 行，在 _EndRecDatafpin.seek(0, 2)AttributeError: 'bytes' 对象没有属性 'seek'

推荐答案

我已经通过这个保存了ms word docx文件

i have saved the ms word docx file through this

import requests
def save_link(book_link, book_name):
    the_book = requests.get(book_link, stream=True)
    with open(book_name, 'wb') as f:
      for chunk in the_book.iter_content(1024 * 1024 * 2):  # 2 MB chunks
        f.write(chunk)

save_link("https://hudoc.echr.coe.int/app/conversion/docx/?library=ECHR&id=001-176931&filename=CASE%20OF%20NDIDI%20v.%20THE%20UNITED%20KINGDOM.docx&logEvent=False","CASE OF NDIDI v. THE UNITED KINGDOM.docx")

这篇关于如何使用来自http url的原始数据在python中下载ms word docx文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用来自http url的原始数据在python中下载ms word docx文件 [英] How to download ms word docx file in python with raw data from http url

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用来自http url的原始数据在python中下载ms word docx文件 [英] How to download ms word docx file in python with raw data from http url

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭