什么是比较两个字节数组的最快方法? [英] What is the fastest way to compare two byte arrays?

查看:167
本文介绍了什么是比较两个字节数组的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想两个长在ByteArrays的VB.NET比较和已经遇到障碍。比较两个50兆字节的文件需要差不多两分钟,所以我清楚地做错事。我是一个64位机器万吨内存,所以没有问题存在。这里是code,我使用的那一刻,很想改变。

I am trying to compare two long bytearrays in VB.NET and have run into a snag. Comparing two 50 megabyte files takes almost two minutes, so I'm clearly doing something wrong. I'm on an x64 machine with tons of memory so there are no issues there. Here is the code that I'm using at the moment and would like to change.

_Bytes item.Bytes 是两个不同的阵列进行比较和已经相同的长度。

_Bytes and item.Bytes are the two different arrays to compare and are already the same length.

For Each B In item.Bytes
   If B <> _Bytes(I) Then
        Mismatch = True
        Exit For
   End If
   I += 1
Next

我需要能够以最快的速度比较作为潜在数百MB,甚至可能是一个或两个千兆字节可能文件。任何建议或算法,将能够更快地这样做?

I need to be able to compare as fast as possible files that are potentially hundreds of megabytes and even possibly a gigabyte or two. Any suggests or algorithms that would be able to do this faster?

Item.bytes 是从返回给比较数据库/文件系统拍摄的对象,因为它的字节长度,该用户想要添加的项目相匹配。通过比较两个数组然后我就可以判断用户是否新的东西加入到DB如果没有的话我可以把它们映射到其他文件,而不是浪费硬盘空间。

Item.bytes is an object taken from the database/filesystem that is returned to compare, because its byte length matches the item that the user wants to add. By comparing the two arrays I can then determine if the user has added something new to the DB and if not then I can just map them to the other file and not waste hard disk drive space.

[更新]

我转换的阵列字节()的局部变量,然后做了同样的比较,同样的code和它跑像一秒钟(我不得不基准它仍然并将其与其他人),但如果你做局部变量同样的事情,并用它成为大规模速度较慢的通用阵列。我不知道为什么,但它提高了我对使用阵列的很多更多的问题。

I converted the arrays to local variables of Byte() and then did the same comparison, same code and it ran in like one second (I have to benchmark it still and compare it to others), but if you do the same thing with local variables and use a generic array it becomes massively slower. I’m not sure why, but it raises a lot more questions for me about the use of arrays.

推荐答案

什么是 _Bytes(I)通话在做什么?它每次不是加载文件,是吗?即使有缓冲,那将是个坏消息!

What is the _Bytes(I) call doing? It's not loading the file each time, is it? Even with buffering, that would be bad news!

将有很多方法可以的微优化的这看待多头的时间,可能使用不安全code等方面 - 但我只是集中精力的合理的性能第一。显然,有一些非常奇怪的事情。

There will be plenty of ways to micro-optimise this in terms of looking at longs at a time, potentially using unsafe code etc - but I'd just concentrate on getting reasonable performance first. Clearly there's something very odd going on.

我建议你抽取比较code到一个单独的函数,它接受两个字节数组。你知道你这样,会不会做什么奇怪。我也想在这种情况下,使用一个简单的循环而不是对于每个 - 这将是简单。哦,并检查长度是否是第一个正确:)

I suggest you extract the comparison code into a separate function which takes two byte arrays. That way you know you won't be doing anything odd. I'd also use a simple For loop rather than For Each in this case - it'll be simpler. Oh, and check whether the lengths are correct first :)

编辑:这里是code(未经测试,但很简单),我会使用。这是在C#中的一刻 - 我会把它转换成一个秒:

Here's the code (untested, but simple enough) that I'd use. It's in C# for the minute - I'll convert it in a sec:

public static bool Equals(byte[] first, byte[] second)
{
    if (first == second)
    {
        return true;
    }
    if (first == null || second == null)
    {
        return false;
    }
    if (first.Length != second.Length)
    {
        return false;
    }
    for (int i=0; i < first.Length; i++)
    {
        if (first[i] != second[i])                
        {
            return false;
        }
    }
    return true;
}

编辑:而这里的VB:

And here's the VB:

Public Shared Function ArraysEqual(ByVal first As Byte(), _
                                   ByVal second As Byte()) As Boolean
    If (first Is second) Then
        Return True
    End If

    If (first Is Nothing OrElse second Is Nothing) Then
        Return False
    End If
    If  (first.Length <> second.Length) Then
         Return False
    End If

    For i as Integer = 0 To first.Length - 1
        If (first(i) <> second(i)) Then
            Return False
        End If
    Next i
    Return True
End Function

这篇关于什么是比较两个字节数组的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆