Django 上传:丢弃上传的重复文件，使用现有文件(基于 md5 的检查) [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)

查看：30 发布时间：2021/12/19 11:32:11 python django django-models django-file-upload

本文介绍了Django 上传:丢弃上传的重复文件，使用现有文件(基于 md5 的检查)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有 FileField 的模型，它保存用户上传的文件.因为我想节省空间，所以我想避免重复.

I have a model with a FileField, which holds user uploaded files. Since I want to save space, I would like to avoid duplicates.

我想要达到的目标:

计算上传的文件md5校验和
使用基于其 md5sum 的文件名存储文件
如果已存在具有该名称的文件(新文件是重复的)，丢弃上传的文件并使用现有文件

Calculate the uploaded files md5 checksum
Store the file with the file name based on its md5sum
If a file with that name is already there (the new file's a duplicate), discard the uploaded file and use the existing file instead

1 和 2 已经可以工作了，但是我如何忘记上传的重复文件并使用现有文件?

1 and 2 is already working, but how would I forget about an uploaded duplicate and use the existing file instead?

请注意，我想保留现有文件并且不覆盖它(主要是为了保持修改时间相同 - 更好地进行备份).

Note that I'd like to keep the existing file and not overwrite it (mainly to keep the modified time the same - better for backup).

注意事项:

我使用的是 Django 1.5
上传处理程序是django.core.files.uploadhandler.TemporaryFileUploadHandler

代码:

def media_file_name(instance, filename):
    h = instance.md5sum
    basename, ext = os.path.splitext(filename)
    return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())

class Media(models.Model):
    orig_file = models.FileField(upload_to=media_file_name)
    md5sum = models.CharField(max_length=36)
    ...

    def save(self, *args, **kwargs):
            if not self.pk:  # file is new
                md5 = hashlib.md5()
                for chunk in self.orig_file.chunks():
                    md5.update(chunk)
                self.md5sum = md5.hexdigest()
            super(Media, self).save(*args, **kwargs)

感谢任何帮助！

推荐答案

多亏了 alTus 的回答，我才知道写一个 自定义存储类是关键，而且比预期的要容易.

Thanks to alTus answer, I was able to figure out that writing a custom storage class is the key, and it was easier than expected.

如果文件已经存在，我只是省略了调用超类 _save 方法来写入文件，我只返回名称.
我覆盖了 get_available_name，以避免在同名文件已经存在时将数字附加到文件名中

I just omit calling the superclasses _save method to write the file if it is already there and I just return the name.
I overwrite get_available_name, to avoid getting numbers appended to the file name if a file with the same name is already existing

我不知道这是否是正确的方法，但到目前为止它运行良好.

I don't know if this is the proper way of doing it, but it works fine so far.

希望有用！

完整的示例代码如下:

import hashlib
import os

from django.core.files.storage import FileSystemStorage
from django.db import models

class MediaFileSystemStorage(FileSystemStorage):
    def get_available_name(self, name, max_length=None):
        if max_length and len(name) > max_length:
            raise(Exception("name's length is greater than max_length"))
        return name

    def _save(self, name, content):
        if self.exists(name):
            # if the file exists, do not call the superclasses _save method
            return name
        # if the file is new, DO call it
        return super(MediaFileSystemStorage, self)._save(name, content)


def media_file_name(instance, filename):
    h = instance.md5sum
    basename, ext = os.path.splitext(filename)
    return os.path.join('mediafiles', h[0:1], h[1:2], h + ext.lower())


class Media(models.Model):
    # use the custom storage class fo the FileField
    orig_file = models.FileField(
        upload_to=media_file_name, storage=MediaFileSystemStorage())
    md5sum = models.CharField(max_length=36)
    # ...

    def save(self, *args, **kwargs):
        if not self.pk:  # file is new
            md5 = hashlib.md5()
            for chunk in self.orig_file.chunks():
                md5.update(chunk)
            self.md5sum = md5.hexdigest()
        super(Media, self).save(*args, **kwargs)

这篇关于Django 上传:丢弃上传的重复文件，使用现有文件(基于 md5 的检查)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Django 上传:丢弃上传的重复文件，使用现有文件(基于 md5 的检查) [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Django 上传:丢弃上传的重复文件，使用现有文件(基于 md5 的检查) [英] Django uploads: Discard uploaded duplicates, use existing file (md5 based check)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭