Django StaticFiles和Amazon S3:如何检测修改的文件? [英] Django StaticFiles and Amazon S3: How to detect modified files?
问题描述
我正在使用django staticfiles + django-storages 和Amazon S3托管我的数据除了每次运行 manage.py collectstatic
,命令将所有文件上传到服务器,所有这些都正常工作。
看起来管理命令比较了在$ code> Storage.modified_time()中的时间戳,这在S3中没有实现存储从django-storages。
你如何确定一个S3文件是否被修改?
可以在我的数据库中存储文件路径和最后修改的数据。或者有一个简单的方法从亚马逊拉上最后修改的数据?另外一个选项:看起来我可以使用 python-boto
分配任意的元数据,在这里我可以把本地修改日期当我上传第一次。
无论如何,这似乎是一个常见的问题,所以我想问一下其他人使用什么解决方案。谢谢!
最新版本的 django-storages
(1.1。 3)通过S3 Boto来处理文件修改检测。
pip install django-storages
现在你好: )
更新:将 AWS_PRELOAD_METADATA
选项设置为 True
在您的设置文件中使用S3Boto类具有非常快的同步。如果使用他的S3,请使用他的PreloadedS3类。
更新2:运行命令仍然非常慢。
更新3:我分叉了django-storages存储库来修复问题并添加了一个pull请求。
问题出现在 modified_time
方法中,即使没有使用回退值也是被调用的。如果块只能在获得
返回无 c> code>
entry = self.entries.get(name,self.bucket.get_key(self._encode_name )))
应该是
entry = self.entries.get(name)
如果条目为None:
entry = self.bucket.get_key(self._encode_name(name))
现在,性能差异来自100秒的1000个请求的< .5s
更新4:
为了同步10k +文件,我相信boto必须从S3起多个请求分页结果导致5-10秒同步时间。这只会变得更糟,因为我们得到更多的文件。
我认为一个解决方案是拥有自定义管理命令或 django-storages
更新文件是存储在具有所有其他文件的元数据的S3上,该文件在通过 collectstatic
命令更新文件时更新。
它不会检测通过其他方式上传的文件,但如果唯一的入口点是管理命令,则不会重要。
I'm using django staticfiles + django-storages and Amazon S3 to host my data. All is working well except that every time I run manage.py collectstatic
the command uploads all files to the server.
It looks like the management command compares timestamps from Storage.modified_time()
which isn't implemented in the S3 storage from django-storages.
How do you guys determine if an S3 file has been modified?
I could store file paths and last modified data in my database. Or is there an easy way to pull the last modified data from Amazon?
Another option: it looks like I can assign arbitrary metadata with python-boto
where I could put the local modified date when I upload the first time.
Anyways, it seems like a common problem so I'd like to ask what solution others have used. Thanks!
The latest version of django-storages
(1.1.3) handles file modification detection through S3 Boto.
pip install django-storages
and you're good now :) Gotta love open source!
Update: set the AWS_PRELOAD_METADATA
option to True
in your settings file to have very fast syncs if using the S3Boto class. If using his S3, use his PreloadedS3 class.
Update 2: It's still extremely slow to run the command.
Update 3: I forked the django-storages repository to fix the issue and added a pull request.
The problem is in the modified_time
method where the fallback value is being called even if it's not being used. I moved the fallback to an if
block to be executed only if get
returns None
entry = self.entries.get(name, self.bucket.get_key(self._encode_name(name)))
Should be
entry = self.entries.get(name)
if entry is None:
entry = self.bucket.get_key(self._encode_name(name))
Now the difference in performance is from <.5s for 1000 requests from 100s
Update 4:
For synching 10k+ files, I believe boto has to make multiple requests since S3 paginates results causing a 5-10 second synch time. This will only get worse as we get more files.
I'm thinking a solution is to have a custom management command or django-storages
update where a file is stored on S3 which has the metadata of all other files, which is updated any time a file is updated via the collectstatic
command.
It won't detect files uploaded via other means but won't matter if the sole entry point is the management command.
这篇关于Django StaticFiles和Amazon S3:如何检测修改的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!