当您只关心速度时，如何存储二进制数据？ [英] How to store binary data when you only care about speed?

查看：156 发布时间：2017/4/3 11:45:52 c++ performance data-structures stl binary

本文介绍了当您只关心速度时，如何存储二进制数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在D维度上有N点，我们说N是100万，而D是100。我所有的点都有二进制坐标，即{0,1} ^ D，我只对速度感兴趣。

目前我的实现使用 std :: vector< int> 。我想知道如果我可以通过更改我的数据结构。我只是进行插入和搜索（我不改变位）。

所有相关问题我发现提到 std :: vector< char> ; ， std :: vector< bool> 和 std :: bitset 提到使用这样的结构应该得到的空间利益。

当C ++中的二进制数据是主流时，什么是适当的数据结构？

我打算用二进制数据填充我的数据结构，然后做很多连续的搜索（我的意思是我不是真的关心点的第i个坐标，如果我正在访问一个点，我将连续访问其所有坐标）。我将计算彼此之间的汉密距离。

解决方案

参考的位置可能是驾驶力。所以很明显，您将单个点的 D 坐标表示为连续的位向量。 std :: bitset< D> 将是一个合理的选择。

然而，下一个重要的事情是您可以轻松地看到4KB的优势。这意味着您不应该选择一个点，并将其与所有其他N-1点进行比较。相反，每组4KB的组点，并比较这些组。两种方式都是 O（N * N），但第二种将会快得多。

你可以通过使用三角不等式来击败 O（N * N） - Hamming（a，b）+ Hamming（b，c）汉明（a，c）。我只是想知道如何。这可能取决于您希望输出的方式。天真的输出将是N * N组的距离，这是不可避免的 O（N * N）。

I have N points in D dimensions, where let's say N is 1 million and D 1 hundred. All my points have binary coordinates, i.e. {0, 1}^D, and I am only interested in speed.

Currently my implementation uses std::vector<int>. I am wondering if I could benefit in terms of faster execution by changing my data-structure. I am only doing insertions and searches (I don't change the bits).

All related questions I found mention std::vector<char>, std::vector<bool> and std::bitset, but all mention the space benefits one should get by using such structures.

What's the appropriate data structure, when speed is of main concern, for binary data in C++?

I intend to populate my data structure with the binary data and then do a lot of contiguous searches (I mean that I don't really care for the i-th coordinate of a point, if I am accessing a point I will access all of its coordinates continuously). I will compute the Hamming distance between each other.

解决方案

Locality of reference will likely be the driving force. So it's fairly obvious that you represent the D coordinates of a single point as a contiguous bitvector. std::bitset<D> would be a logical choice.

However, the next important thing to realize is that you see locality benefits easily up to 4KB. This means that you should not pick a single point and compare it against all other N-1 points. Instead, group points in sets of 4KB each, and compare those groups. Both ways are O(N*N), but the second will be much faster.

You may be able to beat O(N*N) by use of the triangle inequality - Hamming(a,b)+Hamming(b,c) >= Hamming (a,c). I'm just wondering how. It probably depends on how you want your output. The naive output would be a N*N set of distances, and that's unavoidably O(N*N).

这篇关于当您只关心速度时，如何存储二进制数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

当您只关心速度时，如何存储二进制数据？ [英] How to store binary data when you only care about speed?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

当您只关心速度时，如何存储二进制数据？ [英] How to store binary data when you only care about speed?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭