Django 上传:丢弃上传的重复文件,使用现有文件(基于 md5 的检查) [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)
问题描述
我有一个带有 FileField
的模型,它保存用户上传的文件.因为我想节省空间,所以我想避免重复.
I have a model with a FileField
, which holds user uploaded files. Since I want to save space, I would like to avoid duplicates.
我想要达到的目标:
- 计算上传的文件md5校验和
- 使用基于其 md5sum 的文件名存储文件
- 如果已存在具有该名称的文件(新文件是重复的),丢弃上传的文件并使用现有文件
- Calculate the uploaded files md5 checksum
- Store the file with the file name based on its md5sum
- If a file with that name is already there (the new file's a duplicate), discard the uploaded file and use the existing file instead
1 和 2 已经可以工作了,但是我如何忘记上传的重复文件并使用现有文件?
1 and 2 is already working, but how would I forget about an uploaded duplicate and use the existing file instead?
请注意,我想保留现有文件并且不覆盖它(主要是为了保持修改时间相同 - 更好地进行备份).
Note that I'd like to keep the existing file and not overwrite it (mainly to keep the modified time the same - better for backup).
注意事项:
- 我使用的是 Django 1.5
- 上传处理程序是
django.core.files.uploadhandler.TemporaryFileUploadHandler
代码:
def media_file_name(instance, filename):
h = instance.md5sum
basename, ext = os.path.splitext(filename)
return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())
class Media(models.Model):
orig_file = models.FileField(upload_to=media_file_name)
md5sum = models.CharField(max_length=36)
...
def save(self, *args, **kwargs):
if not self.pk: # file is new
md5 = hashlib.md5()
for chunk in self.orig_file.chunks():
md5.update(chunk)
self.md5sum = md5.hexdigest()
super(Media, self).save(*args, **kwargs)
感谢任何帮助!
推荐答案
多亏了 alTus 的回答,我才知道写一个 自定义存储类是关键,而且比预期的要容易.
Thanks to alTus answer, I was able to figure out that writing a custom storage class is the key, and it was easier than expected.
- 如果文件已经存在,我只是省略了调用超类
_save
方法来写入文件,我只返回名称. - 我覆盖了
get_available_name
,以避免在同名文件已经存在时将数字附加到文件名中
- I just omit calling the superclasses
_save
method to write the file if it is already there and I just return the name. - I overwrite
get_available_name
, to avoid getting numbers appended to the file name if a file with the same name is already existing
我不知道这是否是正确的方法,但到目前为止它运行良好.
I don't know if this is the proper way of doing it, but it works fine so far.
希望有用!
完整的示例代码如下:
import hashlib
import os
from django.core.files.storage import FileSystemStorage
from django.db import models
class MediaFileSystemStorage(FileSystemStorage):
def get_available_name(self, name, max_length=None):
if max_length and len(name) > max_length:
raise(Exception("name's length is greater than max_length"))
return name
def _save(self, name, content):
if self.exists(name):
# if the file exists, do not call the superclasses _save method
return name
# if the file is new, DO call it
return super(MediaFileSystemStorage, self)._save(name, content)
def media_file_name(instance, filename):
h = instance.md5sum
basename, ext = os.path.splitext(filename)
return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())
class Media(models.Model):
# use the custom storage class fo the FileField
orig_file = models.FileField(
upload_to=media_file_name, storage=MediaFileSystemStorage())
md5sum = models.CharField(max_length=36)
# ...
def save(self, *args, **kwargs):
if not self.pk: # file is new
md5 = hashlib.md5()
for chunk in self.orig_file.chunks():
md5.update(chunk)
self.md5sum = md5.hexdigest()
super(Media, self).save(*args, **kwargs)
这篇关于Django 上传:丢弃上传的重复文件,使用现有文件(基于 md5 的检查)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!