如何在Python中有效地将小文件上传到Amazon S3 [英] How to upload small files to Amazon S3 efficiently in Python

查看：612 发布时间：2020/8/23 5:30:06 python amazon-web-services amazon-s3

本文介绍了如何在Python中有效地将小文件上传到Amazon S3的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近，我需要实现一个程序，以尽快将驻留在Amazon EC2中的文件上传到Python中的S3.文件大小为30KB.

Recently, I need to implement a program to upload files resides in Amazon EC2 to S3 in Python as quickly as possible. And the size of files are 30KB.

我尝试了一些解决方案，使用多线程，多处理，协同例程.以下是我在Amazon EC2上的性能测试结果.

I have tried some solutions, using multiple threading, multiple processing, co-routine. The following is my performance test result on Amazon EC2.

3600(文件数量)* 30K(文件大小)~~ 105M(总计)--->

3600 (the amount of files) * 30K (file size) ~~ 105M (Total) --->

       **5.5s [ 4 process + 100 coroutine ]**
       10s  [ 200 coroutine ]
       14s  [ 10 threads ]

如下所示的代码

用于多线程

def mput(i, client, files):
    for f in files:
        if hash(f) % NTHREAD == i:
            put(client, os.path.join(DATA_DIR, f))


def test_multithreading():
    client = connect_to_s3_sevice()
    files = os.listdir(DATA_DIR)
    ths = [threading.Thread(target=mput, args=(i, client, files)) for i in range(NTHREAD)]
    for th in ths:
        th.daemon = True
        th.start()
    for th in ths:
        th.join()

用于协程

client = connect_to_s3_sevice()
pool = eventlet.GreenPool(int(sys.argv[2]))

xput = functools.partial(put, client)
files = os.listdir(DATA_DIR)
for f in files:
    pool.spawn_n(xput, os.path.join(DATA_DIR, f))
pool.waitall()

用于多处理

def pproc(i):
    client = connect_to_s3_sevice()
    files = os.listdir(DATA_DIR)
    pool = eventlet.GreenPool(100)

    xput = functools.partial(put, client)
    for f in files:
        if hash(f) % NPROCESS == i:
            pool.spawn_n(xput, os.path.join(DATA_DIR, f))
    pool.waitall()


def test_multiproc():
    procs = [multiprocessing.Process(target=pproc, args=(i, )) for i in range(NPROCESS)]
    for p in procs:
        p.daemon = True
        p.start()
    for p in procs:
        p.join()

计算机的配置为 Ubuntu 14.04、2个CPU(2.50GHz)，4G内存

达到的最高速度约为 19Mb/s(105/5.5).总体而言，它太慢了.有什么办法可以加快速度吗?没有堆栈的python可以做得更快吗?

The highest speed reached is about 19Mb/s (105 / 5.5). Overall, it is too slow. Any way to speed it up? Does stackless python could do it faster?

如何在Python中有效地将小文件上传到Amazon S3 [英] How to upload small files to Amazon S3 efficiently in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Python中有效地将小文件上传到Amazon S3 [英] How to upload small files to Amazon S3 efficiently in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭