为什么带有带有S3Boto后端的django-storages的default_storate.exists()会导致带有大型S3存储桶的内存错误? [英] Why does default_storate.exists() with django-storages with S3Boto backend cause a memory error with a large S3 bucket?
问题描述
运行default_storage.exists()
我在这里关注文档: http://django-storages.readthedocs.org/en/latest/backends/amazon-S3.html
I'm following the docs here: http://django-storages.readthedocs.org/en/latest/backends/amazon-S3.html
这是我的设置文件的相关部分:
Here is the relevant part of my settings file:
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
这是我要重复的问题:
./manage.py shell
from django.core.files.storage import default_storage
# Check default storage is right
default_storage.connection
>>> S3Connection:s3.amazonaws.com
# Check I can write to a file
file = default_storage.open('storage_test_2014', 'w')
file.write("does this work?")
file.close()
file2 = default_storage.open('storage_test_2014', 'r')
file2.read()
>>> 'does this work?'
# Run the exists command
default_storage.exists("asdfjkl") # This file doesn't exist - but the same thing happens no matter what I put here - even if I put 'storage_test_2014'
# Memory usage of the python process creeps up over the next 45 seconds, until it nears 100%
# iPython shell then crashes
>>> Killed
我想到的唯一潜在问题是我的S3存储桶中有93,000个项目-我想知道.exists是否只是下载整个文件列表以进行检查?如果是这种情况,肯定有另一种方法吗?不幸的是,sorth-thumbnail在生成新缩略图时会使用此.exists()函数,这会导致缩略图生成非常慢.
The only potential issue I've thought of is that my S3 bucket has 93,000 items in it - I'm wondering if .exists is just downloading the whole list of files in order to check? If this is the case, surely there must be another way? Unfortunately sorl-thumbnail uses this .exists() function when generating a new thumbnail, which causes thumbnail generation to be extremely slow.
推荐答案
更新(2017年1月23日)
为避免这种情况,您可以在创建Storage
时简单地通过preload_metadata=False
,或在设置中设置AWS_PRELOAD_METADATA = False
.
Update (Jan 23, 2017)
To avoid this, you can simply pass preload_metadata=False
when creating a Storage
, or set AWS_PRELOAD_METADATA = False
in settings.
感谢@ r3mot在评论中的建议.
Thanks @r3mot for this suggestion in the comments.
实际上是因为S3BotoStorage.exists
调用了S3BotoStorage.entries
,如下所示:
In fact, it's because S3BotoStorage.exists
makes a call to S3BotoStorage.entries
, which is as follows:
@property
def entries(self):
"""
Get the locally cached files for the bucket.
"""
if self.preload_metadata and not self._entries:
self._entries = dict((self._decode_name(entry.key), entry)
for entry in self.bucket.list(prefix=self.location))
处理这种情况的最佳方法是将S3BotoStorage
子类如下:
The best way to handle this situation would be to subclass S3BotoStorage
as follows:
from storages.backends.s3boto import S3BotoStorage, parse_ts_extended
class MyS3BotoStorage(S3BotoStorage):
def exists(self, name):
name = self._normalize_name(self._clean_name(name))
k = self.bucket.new_key(self._encode_name(name))
return k.exists()
def size(self, name):
name = self._normalize_name(self._clean_name(name))
return self.bucket.get_key(self._encode_name(name)).size
def modified_time(self, name):
name = self._normalize_name(self._clean_name(name))
k = self.bucket.get_key(self._encode_name(name))
return parse_ts_extended(k.last_modified)
您只需要将此子类放入应用程序的一个模块中,然后在设置模块中通过点划线引用即可.该子类的唯一缺点是,每次调用3个被覆盖的方法中的任何一个都会导致Web请求,这可能没什么大不了的.
You'll have to just put this subclass in one of your app's modules, and reference it via dotted path in your settings module. The only drawback to this subclass is that each call to any of the 3 overridden methods will result in a web request, which might not be a big deal.
这篇关于为什么带有带有S3Boto后端的django-storages的default_storate.exists()会导致带有大型S3存储桶的内存错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!