Django 上传:丢弃上传的重复文件,使用现有文件(基于 md5 的检查) [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)

查看:30
本文介绍了Django 上传:丢弃上传的重复文件,使用现有文件(基于 md5 的检查)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有 FileField 的模型,它保存用户上传的文件.因为我想节省空间,所以我想避免重复.

I have a model with a FileField, which holds user uploaded files. Since I want to save space, I would like to avoid duplicates.

我想要达到的目标:

  1. 计算上传的文件md5校验和
  2. 使用基于其 md5sum 的文件名存储文件
  3. 如果已存在具有该名称的文件(新文件是重复的),丢弃上传的文件并使用现有文件
  1. Calculate the uploaded files md5 checksum
  2. Store the file with the file name based on its md5sum
  3. If a file with that name is already there (the new file's a duplicate), discard the uploaded file and use the existing file instead

12 已经可以工作了,但是我如何忘记上传的重复文件并使用现有文件?

1 and 2 is already working, but how would I forget about an uploaded duplicate and use the existing file instead?

请注意,我想保留现有文件并且覆盖它(主要是为了保持修改时间相同 - 更好地进行备份).

Note that I'd like to keep the existing file and not overwrite it (mainly to keep the modified time the same - better for backup).

注意事项:

  • 我使用的是 Django 1.5
  • 上传处理程序是django.core.files.uploadhandler.TemporaryFileUploadHandler

代码:

def media_file_name(instance, filename):
    h = instance.md5sum
    basename, ext = os.path.splitext(filename)
    return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())

class Media(models.Model):
    orig_file = models.FileField(upload_to=media_file_name)
    md5sum = models.CharField(max_length=36)
    ...

    def save(self, *args, **kwargs):
            if not self.pk:  # file is new
                md5 = hashlib.md5()
                for chunk in self.orig_file.chunks():
                    md5.update(chunk)
                self.md5sum = md5.hexdigest()
            super(Media, self).save(*args, **kwargs)

感谢任何帮助!

推荐答案

多亏了 alTus 的回答,我才知道写一个 自定义存储类是关键,而且比预期的要容易.

Thanks to alTus answer, I was able to figure out that writing a custom storage class is the key, and it was easier than expected.

  • 如果文件已经存在,我只是省略了调用超类 _save 方法来写入文件,我只返回名称.
  • 我覆盖了 get_available_name,以避免在同名文件已经存在时将数字附加到文件名中
  • I just omit calling the superclasses _save method to write the file if it is already there and I just return the name.
  • I overwrite get_available_name, to avoid getting numbers appended to the file name if a file with the same name is already existing

我不知道这是否是正确的方法,但到目前为止它运行良好.

I don't know if this is the proper way of doing it, but it works fine so far.

希望有用!

完整的示例代码如下:

import hashlib
import os

from django.core.files.storage import FileSystemStorage
from django.db import models

class MediaFileSystemStorage(FileSystemStorage):
    def get_available_name(self, name, max_length=None):
        if max_length and len(name) > max_length:
            raise(Exception("name's length is greater than max_length"))
        return name

    def _save(self, name, content):
        if self.exists(name):
            # if the file exists, do not call the superclasses _save method
            return name
        # if the file is new, DO call it
        return super(MediaFileSystemStorage, self)._save(name, content)


def media_file_name(instance, filename):
    h = instance.md5sum
    basename, ext = os.path.splitext(filename)
    return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())


class Media(models.Model):
    # use the custom storage class fo the FileField
    orig_file = models.FileField(
        upload_to=media_file_name, storage=MediaFileSystemStorage())
    md5sum = models.CharField(max_length=36)
    # ...

    def save(self, *args, **kwargs):
        if not self.pk:  # file is new
            md5 = hashlib.md5()
            for chunk in self.orig_file.chunks():
                md5.update(chunk)
            self.md5sum = md5.hexdigest()
        super(Media, self).save(*args, **kwargs)

这篇关于Django 上传:丢弃上传的重复文件,使用现有文件(基于 md5 的检查)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆