下载大的Gzip文件并解压缩时出现内存错误 [英] Memory error while downloading large Gzip files and decompressing them

查看：147 发布时间：2021/5/13 20:10:49 python gzip

本文介绍了下载大的Gzip文件并解压缩时出现内存错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从 https://datasets.imdbws.com下载数据集/title.principals.tsv.gz ，解压缩我的代码本身(Python)中的内容，然后将生成的文件写入磁盘.

I am trying to download a dataset from https://datasets.imdbws.com/title.principals.tsv.gz, decompress the contents in my code itself(Python)and write the resulting file(s) onto disk.

为此，我正在使用以下代码段.

To do so I am using the following code snippet.

results = requests.get(config[sourceFiles]['url'])
    with open(config[sourceFiles]['downloadLocation']+config[sourceFiles]['downloadFileName'], 'wb') as f_out:
        print(config[sourceFiles]['downloadFileName'] + " starting download")
        f_out.write(gzip.decompress(results.content))
        print(config[sourceFiles]['downloadFileName']+" downloaded successfully")

此代码适用于大多数zip文件，但是对于较大的文件，它会显示以下错误消息.

This code works fine for most zip files however for larger files it gives the following error message.

File "C:\Users\****\AppData\Local\Programs\Python\Python37-32\lib\gzip.py", line 532, in decompress
    return f.read()
  File "C:\Users\****\AppData\Local\Programs\Python\Python37-32\lib\gzip.py", line 276, in read
    return self._buffer.read(size)
  File "C:\Users\****\AppData\Local\Programs\Python\Python37-32\lib\gzip.py", line 471, in read
    uncompress = self._decompressor.decompress(buf, size)
MemoryError

有没有一种方法可以实现，而不必将zip文件直接下载到磁盘上并解压缩为实际数据.

Is there a way to accomplish this without having to download the zip file directly onto disk and decompressing it for actual data.

推荐答案

您可以将流请求与 zlib 结合使用:

You can use a streaming request coupled with zlib:

import zlib
import requests

url = 'https://datasets.imdbws.com/title.principals.tsv.gz'
result = requests.get(url, stream=True)
f_out = open("result.txt", "wb")
chunk_size = 1024 * 1024

d = zlib.decompressobj(zlib.MAX_WBITS|32)

for chunk in result.iter_content(chunk_size):
    buffer = d.decompress(chunk)
    f_out.write(buffer)

buffer = d.flush()
f_out.write(buffer)

f_out.close()

此代码段逐块读取数据，并将其提供给可以处理数据流的zlib.
根据您的连接速度和CPU/磁盘性能，您可以测试各种块大小.

This snippet reads the data chunk by chunk and feeds it to zlib which can handle data streams.
Depending on your connection speed and CPU/disk performance you can test various chunk sizes.

这篇关于下载大的Gzip文件并解压缩时出现内存错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

下载大的Gzip文件并解压缩时出现内存错误 [英] Memory error while downloading large Gzip files and decompressing them

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

下载大的Gzip文件并解压缩时出现内存错误 [英] Memory error while downloading large Gzip files and decompressing them

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭