检测用户是否在浏览器窗口中删除了相同的文件两次 [英] Detecting if the user drops the same file twice on a browser window

查看:86
本文介绍了检测用户是否在浏览器窗口中删除了相同的文件两次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想允许用户将图像从桌面拖到浏览器窗口,然后将这些图像上传到服务器。我想只上传一次文件,即使它被多次放在窗口上也是如此。出于安全原因,JavaScript可访问的File对象中的信息是有限的。根据 msdn.microsoft.com ,只能读取以下属性:

I want to allow users to drag images from their desktop onto a browser window and then upload those images to a server. I want to upload each file only once, even if it is dropped on the window several times. For security reasons, the information from File object that is accessible to JavaScript is limited. According to msdn.microsoft.com, only the following properties can be read:


  • name

  • lastModifiedDate

  • name
  • lastModifiedDate

(Safari也曝光 size 输入)。

(Safari also exposes size and type).

用户可以删除两张图片从不同文件夹到浏览器窗口的同名和最后修改日期。这两个图像实际上是不同的,这是一个非常小但有限的机会。

The user can drop two images with the same name and last modified date from different folders onto the browser window. There is a very small but finite chance that these two images are in fact different.

我创建了一个脚本,它读入每个图像文件的原始dataURL,并且将它与先前在窗口上删除的文件进行比较。这样做的一个优点是它可以检测具有不同名称的相同文件。

I've created a script that reads in the raw dataURL of each image file, and compares it to files that were previously dropped on the window. One advantage of this is that it can detect identical files with different names.

这有效,但似乎有点过分。它还需要存储大量数据。我可以通过创建dataURL的哈希值来改进这个(并添加到overkill),然后存储它。

This works, but it seems overkill. It also requires a huge amount of data to be stored. I could improve this (and add to the overkill) by making a hash of the dataURL, and storing that instead.

我希望可能会有一个更优雅的实现目标的方式。你有什么建议吗?

I'm hoping that there may be a more elegant way of achieving my goal. What can you suggest?

<!DOCTYPE html>
<html>
<head>
  <title>Detect duplicate drops</title>
  <style>
html, body {
width: 100%;
height: 100%;
margin: 0;
background: #000;
}
  </style>
  <script>
var body
var imageData = []


document.addEventListener('DOMContentLoaded', function ready() {
  body = document.getElementsByTagName("body")[0]
  body.addEventListener("dragover", swallowEvent, false)
  body.addEventListener("drop", treatDrop, false)
}, false)


function swallowEvent(event) {
  // Prevent browser from loading the dropped image in an empty page
  event.preventDefault()
  event.stopPropagation()
}


function treatDrop(event) {
  swallowEvent(event)

  for (var ii=0, file; file = event.dataTransfer.files[ii]; ii++) {
    importImage(file)
  }
}


function importImage(file) {
    var reader = new FileReader()

    reader.onload = function fileImported(event) {
        var dataURL = event.target.result
        var index = imageData.indexOf(dataURL)
        var img, message

        if (index < 0) {
            index = imageData.length
            console.log(dataURL)
            imageData.push(dataURL, file.name)  
          message = "Image "+file.name+" imported"
        } else {
          message = "Image "+file.name+" imported as "+imageData[index+1]
        }

        img = document.createElement("img")
        img.src = imageData[index] // copy or reference?
        body.appendChild(img)

        console.log(message)
    }

  reader.readAsDataURL(file)
}
  </script>
</head>
<body>
</body>
</html>


推荐答案

这是一个建议(我还没见过)在你的问题中提到):

Here is a suggestion (that I haven't seen being mentioned in your question):

为每个文件创建 Blob URL -object在 FileList -object中存储在浏览器 URL Store 中,保存其URL -串。

Create a Blob URL for each file-object in the FileList-object to be stored in the browsers URL Store, saving their URL-String.

然后将该URL字符串传递给使用 webworker (单独的线程) > FileReader 读取 chunked sections 中的每个文件(通过Blob URL字符串访问),重新使用一个固定大小的缓冲区(几乎像一个循环缓冲区),计算文件的哈希值(有简单/快速的可携带哈希值,如crc32,它通常可以简单地与同一循环中的垂直和水平校验和组合在一起(也可以在块上运行)。 >
您可以通过使用适当的'bufferview'(快4倍)读取32位(无符号)值而不是8位值来加速该过程。系统字节顺序并不重要,不要浪费资源!

Then you pass that URL-string to a webworker (separate thread) which uses the FileReader to read each file (accessed via the Blob URL string) in chunked sections, re-using one fixed-size buffer (almost like a circular buffer), to calculates the file's hash (there are simple/fast carry-able hashes like crc32 which can often be simply combined with a vertical and horizontal checksum in the same loop (also carry-able over chunks)).
You might speed up the process by reading in 32 bit (unsigned) values instead of 8 bit values using an appropriate 'bufferview' (that's 4 times faster). System endianness is not important, don't waste resources on this!

完成后,webworker会将文件的哈希值传回主线程/应用程序,然后执行你的矩阵比较 [[fname,fsize,blobUrl,fhash] / *,etc / *]

Upon completion the webworker then passes back the file's hash to the main-thread/app which then simply performs your matrix comparison of [[fname, fsize, blobUrl, fhash] /* , etc /*].

Pro

重复使用的固定缓冲区显着降低了内存使用量(达到指定的任何级别), webworker通过使用额外的线程(不会阻止主浏览器的线程)来提高性能。

Pro
The re-used fixed buffer significantly brings down your memory usage (to any level you specify), the webworker brings up performance by using the extra thread (which doesn't block your main browser's thread).

Con

对于禁用了javascript的浏览器,你仍然需要服务器端后备(你可以在表单中添加一个隐藏字段,并使用javascript设置它的值作为启用javascript的检查,以降低服务器端负载)。但是..即便如此......你仍然需要服务器端备用来防止恶意输入。

Con
You'd still need serverside fall-back for browsers with javascript disabled (you might add a hidden field to the form and set it's value using javascript as means of a javascript-enabled check, as to lower server-side load). However.. even then.. you'd still need server-side fallback to safeguard against malicious input.

实用性

所以..没有净收益?好吧..如果用户可能上传重复文件(或者只是在基于网络的应用程序中使用它们)的机会是合理的,而不是仅仅为了执行检查而保存在腰带上。这是我书中的一个(生态/金融)胜利。

Usefulness
So.. no net gain? Well.. if the chance is reasonable that the user might upload duplicate files (or just uses them in a web-based app) than you have saved on waisted bandwith just to perform the check. That is quite a (ecological/financial) win in my book.

额外

哈希很容易发生碰撞,期间。为了降低(逼真的)碰撞几率,您需要选择更高级的哈希算法(大多数都可以在分块模式下轻松携带)。更高级哈希的明显折衷是更大的代码大小和更低的速度(更高的CPU使用率)。

Extra
Hashes are prone to collision, period. To lower the (realistic) chance of collision you'd select a more advanced hash-algo (most are easily carry-able in chunked mode). Obvious trade-off for more advanced hashes is larger code-size and lower speed (higher CPU usage).

这篇关于检测用户是否在浏览器窗口中删除了相同的文件两次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆