Django StaticFiles 和 Amazon S3:如何检测修改过的文件? [英] Django StaticFiles and Amazon S3: How to detect modified files?

查看:19
本文介绍了Django StaticFiles 和 Amazon S3:如何检测修改过的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 django staticfiles + django-storages 和 Amazon S3托管我的数据.除了每次运行 manage.py collectstatic 时,该命令都会将所有文件上传到服务器,一切都运行良好.

I'm using django staticfiles + django-storages and Amazon S3 to host my data. All is working well except that every time I run manage.py collectstatic the command uploads all files to the server.

看起来管理命令比较来自 Storage.modified_time() 的时间戳,这在 django-storages 的 S3 存储中没有实现.

It looks like the management command compares timestamps from Storage.modified_time() which isn't implemented in the S3 storage from django-storages.

你们如何确定 S3 文件是否已被修改?

How do you guys determine if an S3 file has been modified?

我可以在我的数据库中存储文件路径和上次修改的数据.或者有没有一种简单的方法可以从亚马逊提取最后修改的数据?

I could store file paths and last modified data in my database. Or is there an easy way to pull the last modified data from Amazon?

另一个选项:看起来我可以使用 python-boto 分配任意元数据,我可以在第一次上传时放置本地修改日期.

Another option: it looks like I can assign arbitrary metadata with python-boto where I could put the local modified date when I upload the first time.

无论如何,这似乎是一个常见问题,所以我想问问其他人使用了什么解决方案.谢谢!

Anyways, it seems like a common problem so I'd like to ask what solution others have used. Thanks!

推荐答案

最新版本的 django-storages (1.1.3) 通过 S3 Boto 处理文件修改检测.

The latest version of django-storages (1.1.3) handles file modification detection through S3 Boto.

pip install django-storages 你现在很好:) 一定要喜欢开源!

pip install django-storages and you're good now :) Gotta love open source!

更新:在您的设置文件中将 AWS_PRELOAD_METADATA 选项设置为 True,以便在使用 S3Boto 类时实现非常快速的同步.如果使用他的 S3,请使用他的 PreloadedS3 类.

Update: set the AWS_PRELOAD_METADATA option to True in your settings file to have very fast syncs if using the S3Boto class. If using his S3, use his PreloadedS3 class.

更新 2:运行命令仍然非常慢.

Update 2: It's still extremely slow to run the command.

更新 3:我分叉了 django-storages 存储库 以修复该问题并添加了拉取请求.

Update 3: I forked the django-storages repository to fix the issue and added a pull request.

问题出在 modified_time 方法中,即使没有使用回退值,也会调用该方法.我将回退移动到 if 块,仅当 get 返回 None

The problem is in the modified_time method where the fallback value is being called even if it's not being used. I moved the fallback to an if block to be executed only if get returns None

entry = self.entries.get(name, self.bucket.get_key(self._encode_name(name)))

应该

    entry = self.entries.get(name)
    if entry is None:
        entry = self.bucket.get_key(self._encode_name(name))

现在性能差异从 100s 到 1000 个请求的 <.5s

Now the difference in performance is from <.5s for 1000 requests from 100s

为了同步 10k+ 文件,我相信 boto 必须发出多个请求,因为 S3 分页结果导致 5-10 秒的同步时间.随着我们获得更多文件,这只会变得更糟.

For synching 10k+ files, I believe boto has to make multiple requests since S3 paginates results causing a 5-10 second synch time. This will only get worse as we get more files.

我在想一个解决方案是使用自定义管理命令或 django-storages 更新,其中文件存储在 S3 上,其中包含所有其他文件的元数据,该元数据随时更新文件通过 collectstatic 命令更新.

I'm thinking a solution is to have a custom management command or django-storages update where a file is stored on S3 which has the metadata of all other files, which is updated any time a file is updated via the collectstatic command.

它不会检测通过其他方式上传的文件,但如果唯一入口点是管理命令,则无关紧要.

It won't detect files uploaded via other means but won't matter if the sole entry point is the management command.

这篇关于Django StaticFiles 和 Amazon S3:如何检测修改过的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆