C比较两个位图的最快方法 [英] C fastest way to compare two bitmaps

查看:75
本文介绍了C比较两个位图的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有两个位图数组,形式为具有数百万条记录的char数组。

There are two arrays of bitmaps in the form of char arrays with millions of records. What could be fastest way to compare them using C.

我可以想象在for循环中一次使用按位运算符xor 1字节。

I can imagine to use bitwise operator xor 1 byte at a time in a for loop.

有关位图的重要点:


  • 运行算法的时间为1%到10%,位图可以不同。在大多数情况下,它们将是相同的。当嘿可以有所不同时,它们可以高达100%。连续条纹中的位更改的可能性很高。

  • 两个位图的长度相同。

目标:


  • 检查它们是否不同,如果是,则在哪里。

  • 正确每次(检测到错误的概率应该为1)。

推荐答案

此答案假设您将位图表示为0/1值的序列,而不是位图图像格式

This answer assumes you mean 'bitmap' as a sequence of 0/1 values rather than 'bitmap image format'

如果您只是拥有两个相同长度的位图,并且希望快速比较它们, memcmp()将有效,因为有人在评论中建议。您可以尝试使用SSE类型优化,但是这些优化不如 memcmp()那样简单。 memcmp()假设您只是想知道它们是不同的,仅此而已。

If you simply have two bitmaps of the same length and wish to compare them quickly, memcmp() will be effective as someone suggested in the comments. You could if you want try using SSE type optimizations, but these are not as easy as memcmp(). memcmp() is assuming you simply want to know 'they are different' and nothing more.

如果您要知道它们相差多少位,例如615位不同,那么除了对每个字节进行XOR并计算差异数外,您别无选择。正如其他人指出的那样,您可能希望一次以32/64甚至256位的速度执行更多操作,具体取决于您的平台。但是,如果数组的长度为数百万个字节,那么最大的延迟(对于当前的CPU)将是将主内存传输到CPU的时间,而CPU的操作将不会有多大关系(此处有很多警告)

If you want to know how many bits they are different by, e.g. 615 bits differ, then again you have little option except to XOR every byte and count the number of differences. As others have noted, you probably want to do this more at 32/64 or even 256 bits at a time, depending on your platform. However, if the arrays are millions of bytes long, then the biggest delay (with current CPUs) will be the time to transfer main memory to the CPU, and it wont matter terribly what the CPU does (lots of caveats here)

如果您的问题更多是关于比较A与B,但实际上您在做很多次,例如A与B以及C,D,E等,那么您可以做几件事

If you question is more asking about comparing A to B, but really you are doing this lots of times, such as A to B and C,D,E etc, then you can do a couple of things


  • A。存储每个数组的校验和,然后首先比较校验和,如果这些校验和相同,则很有可能数组相同。显然,这里存在校验和相等但数据可能不同的风险,因此请确保在这种情况下的错误结果不会产生严重的副作用。而且,如果您无法承受错误的结果,请不要使用此技术。

  • B。如果数组具有结构(例如图像数据),则可以利用特定工具进行处理,这超出了答案的范围。

  • C。如果可以有效压缩图像数据,则压缩每个数组并使用压缩形式进行比较。如果使用ZIP类型的压缩,则无法直接从zip判断出多少位差异,但是其他技术(例如RLE)可以有效地快速计算位差异(但要进行大量工作才能构建并快速正确地获取信息)

  • D。如果用(a)表示的风险是可以接受的,则可以对每个262144位的块进行校验和,并且仅在校验和不同的地方对差异进行计数。

  • A. Store a checksum of each array and first compare the checksums, if these are the same then there is a high chance the arrays are the same. Obviously there is a risk here that checksums can be equal but the data can differ, so make sure that a false result in this case will not have dramatic side effects. And, if you cannot withstand false results, do not use this technique.
  • B. if the arrays have structure, such as they are image data, then leverage specific tools for this, how is beyond this answer to explain.
  • C. If the image data can be compressed effectively, then compress each array and compare using the compressed form. If you use ZIP type of compression you cannot tell directly from zip how many bits differ, but other techniques such as RLE can be effective to quickly count bit differences (but are a lot of work to build and get correct and fast)
  • D. If the risk with (a) is acceptable, then you can checksum each chunk of say 262144 bits, and only count differences where checksums differ. This heavily reduces main memory access and will go lots faster.

所有选项A..D都将减少主存储器访问,因为这是任何性能提升的小问题(针对所述问题)

All of the options A..D are about reducing main memory access as this is the nub of any performance gain (for problem as stated)

这篇关于C比较两个位图的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆