在Django中通过哈希有效地保存文件 [英] Efficiently saving a file by hash in Django

查看:47
本文介绍了在Django中通过哈希有效地保存文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究Django项目.我希望用户能够执行的操作是(通过表单)上载文件,然后将文件本地保存到自定义路径并使用自定义文件名(其哈希值).我能想到的唯一解决方案是使用我正在使用的FileField的"upload_to"参数.这是什么意思(我认为):

I am working on a Django project. What I want the user to be able to do is upload a file (through a Form) and then save the file locally to a custom path and with a custom filename - its hash. The only solution I can think of is by using the "upload_to" argument of the FileField I'm using. What this translates to (I think):

1)将文件写入磁盘

2)计算哈希值

3)返回路径+哈希作为文件名

3) Return path + hash as filename

问题在于有两种写操作:一种是将文件从内存保存到磁盘以计算哈希值,另一种是实际将文件保存到指定位置时.

The problem is that there are two write operations: one when saving the file from memory to disk to calculate the hash, and another one when actually saving the file to specified location.

有没有一种方法可以覆盖FileField的保存到磁盘"方法(或者在哪里可以找到确切的操作信息),这样我基本上可以使用一个临时名称保存文件,然后将其重命名为hash,而不是它被保存了两次.

Is there a way to override FileField's save to disk method (or where can I find exactly what's going on behind the scenes) so that I can basically save the file using a temporary name and then rename it to hash, instead of having it be saved twice.

谢谢.

推荐答案

FileField upload_to 参数接受可调用对象,并且从中返回的字符串将被连接到您的 MEDIA_ROOT 设置以获取最终文件名(从

The upload_to parameter of FileField accepts a callable, and the string returned from that is joined to your MEDIA_ROOT setting to get the final filename (from the documentation):

这也可以是可调用的,例如函数,将被调用以获取上载路径,包括文件名.此可调用对象必须能够接受两个参数,并返回要传递给存储系统的Unix样式的路径(带有正斜杠).将传递的两个参数是:

This may also be a callable, such as a function, which will be called to obtain the upload path, including the filename. This callable must be able to accept two arguments, and return a Unix-style path (with forward slashes) to be passed along to the storage system. The two arguments that will be passed are:

  • instance :定义FileField的模型的实例.更具体地说,这是附加当前文件的特定实例.在大多数情况下,该对象尚未保存到数据库,因此,如果使用默认的AutoField,则它的主键字段可能尚未具有值.
  • 文件名:最初赋予文件的文件名.在确定最终目标路径时,可以考虑也可以不考虑.
  • instance: An instance of the model where the FileField is defined. More specifically, this is the particular instance where the current file is being attached. In most cases, this object will not have been saved to the database yet, so if it uses the default AutoField, it might not yet have a value for its primary key field.
  • filename: The filename that was originally given to the file. This may or may not be taken into account when determining the final destination path.

此外,当您访问 model.my_file_field 时,它解析为

Additionally, when you access model.my_file_field, it resolves to an instance of FieldFile, which acts like a file. So, you should be able to write an upload_to like the following:

def hash_upload(instance, filename):
    instance.my_file.open() # make sure we're at the beginning of the file
    contents = instance.my_file.read() # get the contents
    fname, ext = os.path.splitext(filename)
    return "{0}_{1}{2}".format(fname, hash_function(contents), ext) # assemble the filename

替换您要使用的适当的哈希函数.根本不需要保存到磁盘上(实际上,文件通常已经上传到临时存储中,或者如果较小的文件仅保留在内存中).

Substitute the appropriate hash function you'd like to use. Saving to the disk isn't necessary at all (in fact, the file is often already uploaded to temporary storage, or in the case of smaller files just kept in memory).

您将这样使用:

class MyModel(models.Model):
    my_file = models.FileField(upload_to=hash_upload,...)

我尚未对此进行测试,因此您可能必须戳一下读取整个文件的行(并且您可能只想散列文件的第一块,以防止恶意用户上传大量文件并导致DoS攻击).您可以使用
获得第一个块 instance.my_file.read(instance.my_file.DEFAULT_CHUNK_SIZE).

I haven't tested this yet, so you might have to poke at the line that reads the whole file (and you may want to just hash the first chunk of the file to prevent malicious users from uploading massive files and causing DoS attacks). You can get the first chunk with
instance.my_file.read(instance.my_file.DEFAULT_CHUNK_SIZE).

这篇关于在Django中通过哈希有效地保存文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆