Django的StaticFiles和Amazon S3:如何检测修改的文件? [英] Django StaticFiles and Amazon S3: How to detect modified files?

查看:171
本文介绍了Django的StaticFiles和Amazon S3:如何检测修改的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Django staticfiles + Django的存储器和Amazon S3来承载我的数据。所有这些都是我每次执行 manage.py collectstatic 命令上传的所有文件到服务器时,运作良好,除了。

I'm using django staticfiles + django-storages and Amazon S3 to host my data. All is working well except that every time I run manage.py collectstatic the command uploads all files to the server.

它看起来像管理命令从比较时间戳Storage.modified_time()这是不是从Django的存储器的S3存储来实现。

It looks like the management command compares timestamps from Storage.modified_time() which isn't implemented in the S3 storage from django-storages.

你们如何确定一个S3文件已被修改?

How do you guys determine if an S3 file has been modified?

我可以存储文件路径和最后修改的数据在我的数据库。还是有一个简单的方法来从亚马逊拉最后一次修改的数据?

I could store file paths and last modified data in my database. Or is there an easy way to pull the last modified data from Amazon?

另一种选择:它看起来像我可以指定任意元数据与的python-博托在那里我可以把本地修改的日期时,我上传的第一次

Another option: it looks like I can assign arbitrary metadata with python-boto where I could put the local modified date when I upload the first time.

不管怎么说,这似乎是一个常见的​​问题,所以我想请问有什么解决办法其他人使用。谢谢!

Anyways, it seems like a common problem so I'd like to ask what solution others have used. Thanks!

推荐答案

Django的货仓(1.1.3)的最新版本通过S3博托处理文件修改检测。

The latest version of django-storages (1.1.3) handles file modification detection through S3 Boto.

PIP安装Django-货仓键,你现在好了:)爱是爱开源!

pip install django-storages and you're good now :) Gotta love open source!

更新:在你的设置文件中设置 AWS_ preLOAD_METADATA 选项有非常快的同步如果使用S3Boto类。如果使用自己的S3,用自己的preloadedS3类。

Update: set the AWS_PRELOAD_METADATA option to True in your settings file to have very fast syncs if using the S3Boto class. If using his S3, use his PreloadedS3 class.

更新2:它仍然非常缓慢运行的命令。

Update 2: It's still extremely slow to run the command.

更新3:我叉的Django的存储器存储库来解决这个问题,并增加了一个拉请求。

Update 3: I forked the django-storages repository to fix the issue and added a pull request.

现在的问题是,在 modified_time 方法,其中,即使它没有被使用的回退值被调用。我提出回退给如果块要执行只有当 GET 返回

The problem is in the modified_time method where the fallback value is being called even if it's not being used. I moved the fallback to an if block to be executed only if get returns None

entry = self.entries.get(name, self.bucket.get_key(self._encode_name(name)))

    entry = self.entries.get(name)
    if entry is None:
        entry = self.bucket.get_key(self._encode_name(name))

现在在性能上的差异是来自以下的; .5s从100秒1000请求

Now the difference in performance is from <.5s for 1000 requests from 100s

有关假唱10K +文件,相信博托已经做出,因为S3结果进行分页造成了5-10秒的时间同步多个请求。这只会变得更糟,因为我们获得更多的文件。

For synching 10k+ files, I believe boto has to make multiple requests since S3 paginates results causing a 5-10 second synch time. This will only get worse as we get more files.

我在想一个解决方案是让其中的文件存储在S3拥有其他所有文件的元数据的自定义管理命令或 Django的货仓更新,这是更新的任何时候的文件通过 collectstatic 命令更新。

I'm thinking a solution is to have a custom management command or django-storages update where a file is stored on S3 which has the metadata of all other files, which is updated any time a file is updated via the collectstatic command.

这将不检测通过其他手段上传的文件,但不会重要,如果唯一的入口点是管理命令。

It won't detect files uploaded via other means but won't matter if the sole entry point is the management command.

这篇关于Django的StaticFiles和Amazon S3:如何检测修改的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆