Django上传:丢弃上传的副本,使用现有文件(基于md5的检查) [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)

查看:358
本文介绍了Django上传:丢弃上传的副本,使用现有文件(基于md5的检查)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 FileField 的模型,它包含用户上传的文件。因为我想要节省空间,我想避免重复。

I have a model with a FileField, which holds user uploaded files. Since I want to save space, I would like to avoid duplicates.

我想实现什么:


  1. 计算上传的文件 md5校验和

  2. 存储文件根据其md5sum 的文件名

  3. 如果具有该名称的文件已经存在(新文件为重复), 放弃上传的文件并使用现有文件

  1. Calculate the uploaded files md5 checksum
  2. Store the file with the file name based on its md5sum
  3. If a file with that name is already there (the new file's a duplicate), discard the uploaded file and use the existing file instead

1 strong> 2 已经在运行,但是如何忘记上传的副本,并使用现有文件?

1 and 2 is already working, but how would I forget about an uploaded duplicate and use the existing file instead?

注意我想保留现有的文件和覆盖它(主要是为了保持修改的时间一样 - 更好的备份)。

Note that I'd like to keep the existing file and not overwrite it (mainly to keep the modified time the same - better for backup).

注意:


  • 我正在使用Django 1.5

  • 上传处理程序是 dja ngo.core.files.uploadhandler.TemporaryFileUploadHandler

  • I'm using Django 1.5
  • The upload handler is django.core.files.uploadhandler.TemporaryFileUploadHandler

代码: p>

Code:

def media_file_name(instance, filename):
    h = instance.md5sum
    basename, ext = os.path.splitext(filename)
    return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())

class Media(models.Model):
    orig_file = models.FileField(upload_to=media_file_name)
    md5sum = models.CharField(max_length=36)
    ...

    def save(self, *args, **kwargs):
            if not self.pk:  # file is new
                md5 = hashlib.md5()
                for chunk in self.orig_file.chunks():
                    md5.update(chunk)
                self.md5sum = md5.hexdigest()
            super(Media, self).save(*args, **kwargs)

任何帮助不胜感激!

推荐答案

感谢alTus的答案,我是能够弄清楚写一个 自定义存储类< a> 是关键,它比预期更容易。

Thanks to alTus answer, I was able to figure out that writing a custom storage class is the key, and it was easier than expected.


  • 我只是省略调用超类 _save 方法来编写文件,如果它已经在那里,我只是返回名称。

  • 我覆盖 get_available_name ,以避免在具有相同名称的文件已存在的文件名中附加数字

  • I just omit calling the superclasses _save method to write the file if it is already there and I just return the name.
  • I overwrite get_available_name, to avoid getting numbers appended to the file name if a file with the same name is already existing

我不知道这是否是正确的方法,但到目前为止,它工作正常。

I don't know if this is the proper way of doing it, but it works fine so far.

希望这是有用的!

以下是完整的示例代码:

import hashlib
import os

from django.core.files.storage import FileSystemStorage
from django.db import models

class MediaFileSystemStorage(FileSystemStorage):
    def get_available_name(self, name, max_length=None):
        if max_length and len(name) > max_length:
            raise(Exception("name's length is greater than max_length"))
        return name

    def _save(self, name, content):
        if self.exists(name):
            # if the file exists, do not call the superclasses _save method
            return name
        # if the file is new, DO call it
        return super(MediaFileSystemStorage, self)._save(name, content)


def media_file_name(instance, filename):
    h = instance.md5sum
    basename, ext = os.path.splitext(filename)
    return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())


class Media(models.Model):
    # use the custom storage class fo the FileField
    orig_file = models.FileField(
        upload_to=media_file_name, storage=MediaFileSystemStorage())
    md5sum = models.CharField(max_length=36)
    # ...

    def save(self, *args, **kwargs):
        if not self.pk:  # file is new
            md5 = hashlib.md5()
            for chunk in self.orig_file.chunks():
                md5.update(chunk)
            self.md5sum = md5.hexdigest()
        super(Media, self).save(*args, **kwargs)

这篇关于Django上传:丢弃上传的副本,使用现有文件(基于md5的检查)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆