Django上传:丢弃上传的副本,使用现有文件(基于md5的检查) [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)
问题描述
我有一个包含 FileField
的模型,它包含用户上传的文件。因为我想要节省空间,我想避免重复。
I have a model with a FileField
, which holds user uploaded files. Since I want to save space, I would like to avoid duplicates.
我想实现什么:
- 计算上传的文件 md5校验和
- 存储文件根据其md5sum 的文件名
- 如果具有该名称的文件已经存在(新文件为重复), 放弃上传的文件并使用现有文件
- Calculate the uploaded files md5 checksum
- Store the file with the file name based on its md5sum
- If a file with that name is already there (the new file's a duplicate), discard the uploaded file and use the existing file instead
1 strong> 2 已经在运行,但是如何忘记上传的副本,并使用现有文件?
1 and 2 is already working, but how would I forget about an uploaded duplicate and use the existing file instead?
注意我想保留现有的文件和不覆盖它(主要是为了保持修改的时间一样 - 更好的备份)。
Note that I'd like to keep the existing file and not overwrite it (mainly to keep the modified time the same - better for backup).
注意:
- 我正在使用Django 1.5
- 上传处理程序是
dja ngo.core.files.uploadhandler.TemporaryFileUploadHandler
- I'm using Django 1.5
- The upload handler is
django.core.files.uploadhandler.TemporaryFileUploadHandler
代码: p>
Code:
def media_file_name(instance, filename):
h = instance.md5sum
basename, ext = os.path.splitext(filename)
return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())
class Media(models.Model):
orig_file = models.FileField(upload_to=media_file_name)
md5sum = models.CharField(max_length=36)
...
def save(self, *args, **kwargs):
if not self.pk: # file is new
md5 = hashlib.md5()
for chunk in self.orig_file.chunks():
md5.update(chunk)
self.md5sum = md5.hexdigest()
super(Media, self).save(*args, **kwargs)
任何帮助不胜感激!
推荐答案
感谢alTus的答案,我是能够弄清楚写一个 自定义存储类< a> 是关键,它比预期更容易。
Thanks to alTus answer, I was able to figure out that writing a custom storage class is the key, and it was easier than expected.
- 我只是省略调用超类
_save
方法来编写文件,如果它已经在那里,我只是返回名称。 - 我覆盖
get_available_name
,以避免在具有相同名称的文件已存在的文件名中附加数字
- I just omit calling the superclasses
_save
method to write the file if it is already there and I just return the name. - I overwrite
get_available_name
, to avoid getting numbers appended to the file name if a file with the same name is already existing
我不知道这是否是正确的方法,但到目前为止,它工作正常。
I don't know if this is the proper way of doing it, but it works fine so far.
希望这是有用的!
以下是完整的示例代码:
import hashlib
import os
from django.core.files.storage import FileSystemStorage
from django.db import models
class MediaFileSystemStorage(FileSystemStorage):
def get_available_name(self, name, max_length=None):
if max_length and len(name) > max_length:
raise(Exception("name's length is greater than max_length"))
return name
def _save(self, name, content):
if self.exists(name):
# if the file exists, do not call the superclasses _save method
return name
# if the file is new, DO call it
return super(MediaFileSystemStorage, self)._save(name, content)
def media_file_name(instance, filename):
h = instance.md5sum
basename, ext = os.path.splitext(filename)
return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())
class Media(models.Model):
# use the custom storage class fo the FileField
orig_file = models.FileField(
upload_to=media_file_name, storage=MediaFileSystemStorage())
md5sum = models.CharField(max_length=36)
# ...
def save(self, *args, **kwargs):
if not self.pk: # file is new
md5 = hashlib.md5()
for chunk in self.orig_file.chunks():
md5.update(chunk)
self.md5sum = md5.hexdigest()
super(Media, self).save(*args, **kwargs)
这篇关于Django上传:丢弃上传的副本,使用现有文件(基于md5的检查)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!