Django上传：丢弃上传的副本，使用现有文件（基于md5的检查） [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)

查看：358 发布时间：2017/5/28 7:24:27 python django django-models django-file-upload

本文介绍了Django上传：丢弃上传的副本，使用现有文件（基于md5的检查）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含 FileField 的模型，它包含用户上传的文件。因为我想要节省空间，我想避免重复。

I have a model with a FileField, which holds user uploaded files. Since I want to save space, I would like to avoid duplicates.

我想实现什么：

计算上传的文件 md5校验和

存储文件根据其md5sum 的文件名

如果具有该名称的文件已经存在（新文件为重复）， 放弃上传的文件并使用现有文件

Calculate the uploaded files md5 checksum

Store the file with the file name based on its md5sum

If a file with that name is already there (the new file's a duplicate), discard the uploaded file and use the existing file instead

1 strong> 2 已经在运行，但是如何忘记上传的副本，并使用现有文件？

1 and 2 is already working, but how would I forget about an uploaded duplicate and use the existing file instead?

注意我想保留现有的文件和不覆盖它（主要是为了保持修改的时间一样 - 更好的备份）。

Note that I'd like to keep the existing file and not overwrite it (mainly to keep the modified time the same - better for backup).

注意：

我正在使用Django 1.5

上传处理程序是 dja ngo.core.files.uploadhandler.TemporaryFileUploadHandler

I'm using Django 1.5
The upload handler is django.core.files.uploadhandler.TemporaryFileUploadHandler

代码： p>

Code:

def media_file_name(instance, filename):
    h = instance.md5sum
    basename, ext = os.path.splitext(filename)
    return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())

class Media(models.Model):
    orig_file = models.FileField(upload_to=media_file_name)
    md5sum = models.CharField(max_length=36)
    ...

    def save(self, *args, **kwargs):
            if not self.pk:  # file is new
                md5 = hashlib.md5()
                for chunk in self.orig_file.chunks():
                    md5.update(chunk)
                self.md5sum = md5.hexdigest()
            super(Media, self).save(*args, **kwargs)

任何帮助不胜感激！

推荐答案

感谢alTus的答案，我是能够弄清楚写一个 自定义存储类< a> 是关键，它比预期更容易。

Thanks to alTus answer, I was able to figure out that writing a custom storage class is the key, and it was easier than expected.

我只是省略调用超类 _save 方法来编写文件，如果它已经在那里，我只是返回名称。

我覆盖 get_available_name ，以避免在具有相同名称的文件已存在的文件名中附加数字

I just omit calling the superclasses _save method to write the file if it is already there and I just return the name.
I overwrite get_available_name, to avoid getting numbers appended to the file name if a file with the same name is already existing

我不知道这是否是正确的方法，但到目前为止，它工作正常。

I don't know if this is the proper way of doing it, but it works fine so far.

希望这是有用的！

以下是完整的示例代码：

import hashlib
import os

from django.core.files.storage import FileSystemStorage
from django.db import models

class MediaFileSystemStorage(FileSystemStorage):
    def get_available_name(self, name, max_length=None):
        if max_length and len(name) > max_length:
            raise(Exception("name's length is greater than max_length"))
        return name

    def _save(self, name, content):
        if self.exists(name):
            # if the file exists, do not call the superclasses _save method
            return name
        # if the file is new, DO call it
        return super(MediaFileSystemStorage, self)._save(name, content)


def media_file_name(instance, filename):
    h = instance.md5sum
    basename, ext = os.path.splitext(filename)
    return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())


class Media(models.Model):
    # use the custom storage class fo the FileField
    orig_file = models.FileField(
        upload_to=media_file_name, storage=MediaFileSystemStorage())
    md5sum = models.CharField(max_length=36)
    # ...

    def save(self, *args, **kwargs):
        if not self.pk:  # file is new
            md5 = hashlib.md5()
            for chunk in self.orig_file.chunks():
                md5.update(chunk)
            self.md5sum = md5.hexdigest()
        super(Media, self).save(*args, **kwargs)

这篇关于Django上传：丢弃上传的副本，使用现有文件（基于md5的检查）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Django上传：丢弃上传的副本，使用现有文件（基于md5的检查） [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Django上传：丢弃上传的副本，使用现有文件（基于md5的检查） [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭