为什么带有带有S3Boto后端的django-storages的default_storate.exists()会导致带有大型S3存储桶的内存错误? [英] Why does default_storate.exists() with django-storages with S3Boto backend cause a memory error with a large S3 bucket?

查看:141
本文介绍了为什么带有带有S3Boto后端的django-storages的default_storate.exists()会导致带有大型S3存储桶的内存错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行default_storage.exists()

我在这里关注文档: http://django-storages.readthedocs.org/en/latest/backends/amazon-S3.html

I'm following the docs here: http://django-storages.readthedocs.org/en/latest/backends/amazon-S3.html

这是我的设置文件的相关部分:

Here is the relevant part of my settings file:

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

这是我要重复的问题:

./manage.py shell

from django.core.files.storage import default_storage

# Check default storage is right
default_storage.connection
>>> S3Connection:s3.amazonaws.com

# Check I can write to a file
file = default_storage.open('storage_test_2014', 'w')
file.write("does this work?")
file.close()
file2 = default_storage.open('storage_test_2014', 'r')
file2.read()
>>> 'does this work?'

# Run the exists command
default_storage.exists("asdfjkl") # This file doesn't exist - but the same thing happens no matter what I put here - even if I put 'storage_test_2014'

# Memory usage of the python process creeps up over the next 45 seconds, until it nears 100%
# iPython shell then crashes
>>> Killed

我想到的唯一潜在问题是我的S3存储桶中有93,000个项目-我想知道.exists是否只是下载整个文件列表以进行检查?如果是这种情况,肯定有另一种方法吗?不幸的是,sorth-thumbnail在生成新缩略图时会使用此.exists()函数,这会导致缩略图生成非常慢.

The only potential issue I've thought of is that my S3 bucket has 93,000 items in it - I'm wondering if .exists is just downloading the whole list of files in order to check? If this is the case, surely there must be another way? Unfortunately sorl-thumbnail uses this .exists() function when generating a new thumbnail, which causes thumbnail generation to be extremely slow.

推荐答案

更新(2017年1月23日)

为避免这种情况,您可以在创建Storage时简单地通过preload_metadata=False,或在设置中设置AWS_PRELOAD_METADATA = False.

Update (Jan 23, 2017)

To avoid this, you can simply pass preload_metadata=False when creating a Storage, or set AWS_PRELOAD_METADATA = False in settings.

感谢@ r3mot在评论中的建议.

Thanks @r3mot for this suggestion in the comments.

实际上是因为S3BotoStorage.exists调用了S3BotoStorage.entries,如下所示:

In fact, it's because S3BotoStorage.exists makes a call to S3BotoStorage.entries, which is as follows:

    @property
    def entries(self):
        """
        Get the locally cached files for the bucket.
        """
        if self.preload_metadata and not self._entries:
            self._entries = dict((self._decode_name(entry.key), entry)
                                for entry in self.bucket.list(prefix=self.location))

处理这种情况的最佳方法是将S3BotoStorage子类如下:

The best way to handle this situation would be to subclass S3BotoStorage as follows:

from storages.backends.s3boto import S3BotoStorage, parse_ts_extended


class MyS3BotoStorage(S3BotoStorage):
    def exists(self, name):
        name = self._normalize_name(self._clean_name(name))
        k = self.bucket.new_key(self._encode_name(name))
        return k.exists()

    def size(self, name):
        name = self._normalize_name(self._clean_name(name))
        return self.bucket.get_key(self._encode_name(name)).size

    def modified_time(self, name):
        name = self._normalize_name(self._clean_name(name))
        k = self.bucket.get_key(self._encode_name(name))
        return parse_ts_extended(k.last_modified)

您只需要将此子类放入应用程序的一个模块中,然后在设置模块中通过点划线引用即可.该子类的唯一缺点是,每次调用3个被覆盖的方法中的任何一个都会导致Web请求,这可能没什么大不了的.

You'll have to just put this subclass in one of your app's modules, and reference it via dotted path in your settings module. The only drawback to this subclass is that each call to any of the 3 overridden methods will result in a web request, which might not be a big deal.

这篇关于为什么带有带有S3Boto后端的django-storages的default_storate.exists()会导致带有大型S3存储桶的内存错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆