如何从S3下载大型csv文件,而不会遇到“内存不足"的问题? [英] How to download large csv files from S3 without running into 'out of memory' issue?

查看:64
本文介绍了如何从S3下载大型csv文件,而不会遇到“内存不足"的问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要处理存储在S3存储桶中的大文件.我需要将csv文件分成较小的块进行处理.但是,这似乎是在文件系统存储上胜于在对象存储上完成的任务.因此,我计划将大文件下载到本地,将其分成较小的块,然后将结果文件一起上传到另一个文件夹中.我知道方法 download_fileobj ,但无法确定在下载大小约为10GB的大文件时,是否会导致内存不足错误.

I need to process large files stored in S3 bucket. I need to divide the csv file into smaller chunks for processing. However, this seems to be a task done better on file-system storage rather an on object storage. Hence, I am planning to download the large file to local, divide it into smaller chunks and then upload the resultant files together in a different folder. I am aware of the method download_fileobj but could not determine whether it would result in out of memory error while downloading large files of sizes ~= 10GB.

推荐答案

我建议使用

I would recommend using download_file():

import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')

下载时,它不会用完内存.Boto3将负责转移过程.

It will not run out of memory while downloading. Boto3 will take care of the transfer process.

这篇关于如何从S3下载大型csv文件,而不会遇到“内存不足"的问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆