Hamming多维数据集的顶点上的查询点 [英] Query points on the vertices of a Hamming cube
问题描述
我有N个点,只存在于维度D的立方体的顶点上,其中D类似于3。
I have N points that lie only on the vertices of a cube, of dimension D, where D is something like 3.
顶点可能不包含任何点。所以每个点都有{0,1} D 中的坐标。只要查询时间只对内存成本感兴趣(例如N中不是指数)。
A vertex may not contain any point. So every point has coordinates in {0, 1}D. I am only interested in query time, as long as the memory cost is reasonable ( not exponential in N for example :) ).
给定位于多维数据集顶点之一的查询和输入参数 r
,找到所有具有汉明距离的顶点(因此,点) = r
与查询。
Given a query that lies on one of the cube's vertices and an input parameter r
, find all the vertices (thus points) that have hamming distance <= r
with the query.
有什么方法去一个 c ++ environment?
What's the way to go in a c++ environment?
我正在想一个kd树,但我不确定,需要帮助,任何输入,甚至近似值,将不胜感激!由于汉明距离发挥作用,按位操作应该有所帮助(例如XOR)。
I am thinking of a kd-tree, but I am not sure and want help, any input, even approximative, would be appreciated! Since hamming distance comes into play, bitwise manipulations should help (e.g. XOR).
推荐答案
一个很好的bithack从一个位掩码,以 k
位设置为词典下一个排列,这意味着循环使用 k
位设置的所有掩码是相当简单的。使用初始值对这些蒙版进行异或,使汉明距离的所有值与 k
远离。
There is a nice bithack to go from one bitmask with k
bits set to the lexicographically next permutation, which means it's fairly simple to loop through all masks with k
bits set. XORing these masks with an initial value gives all the values at hamming distance exactly k
away from it.
所以对于 D
尺寸,其中 D
小于32(否则更改类型),
So for D
dimensions, where D
is less than 32 (otherwise change the types),
uint32_t limit = (1u << D) - 1;
for (int k = 1; k <= r; k++) {
uint32_t diff = (1u << k) - 1;
while (diff <= limit) {
// v is the input vertex
uint32_t vertex = v ^ diff;
// use it
diff = nextBitPermutation(diff);
}
}
其中 nextBitPermutation
可以用C ++实现,如(如果你有 __ builtin_ctz
)
Where nextBitPermutation
may be implemented in C++ as something like (if you have __builtin_ctz
)
uint32_t nextBitPermutation(uint32_t v) {
// see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
uint32_t t = v | (v - 1);
return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}
或MSVC(未测试)
uint32_t nextBitPermutation(uint32_t v) {
// see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
uint32_t t = v | (v - 1);
unsigned long tzc;
_BitScanForward(&tzc, v); // v != 0 so the return value doesn't matter
return (t + 1) | (((~t & -~t) - 1) >> (tzc + 1));
}
如果D是>真的低,4或更低,旧的 popcnt
-with - pshufb
工作真的很好,一般一切都很顺利,如下所示:
If D is really low, 4 or lower, the old popcnt
-with-pshufb
works really well and generally everything just lines up well, like this:
uint16_t query(int vertex, int r, int8_t* validmask)
{
// validmask should be array of 16 int8_t's,
// 0 for a vertex that doesn't exist, -1 if it does
__m128i valid = _mm_loadu_si128((__m128i*)validmask);
__m128i t0 = _mm_set1_epi8(vertex);
__m128i r0 = _mm_set1_epi8(r + 1);
__m128i all = _mm_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
__m128i popcnt_lut = _mm_setr_epi8(0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4);
__m128i dist = _mm_shuffle_epi8(popcnt_lut, _mm_xor_si128(t0, all));
__m128i close_enough = _mm_cmpgt_epi8(r0, dist);
__m128i result = _mm_and_si128(close_enough, valid);
return _mm_movemask_epi8(result);
}
这应该是相当快的;与上面的bithack相比( nextBitPermutation
,相当重,在那里使用很多),并且比较了所有顶点的循环,并测试它们是否在范围内与内置的 popcnt
,自动占用至少16个周期,并且上述不应该假定一切都被缓存,甚至永久地在一个寄存器中)。结果是令人厌烦的工作,因为它是一个掩码,哪些顶点都存在和在查询点的范围,而不是它们的列表。它会很好地结合对与点相关的数据进行一些处理。
This should be fairly fast; fast compared to the bithack above (nextBitPermutation
, which is fairly heavy, is used a lot there) and also compared to looping over all vertices and testing whether they are in range (even with builtin popcnt
, that automatically takes at least 16 cycles and the above shouldn't, assuming everything is cached or even permanently in a register). The downside is the result is annoying to work with, since it's a mask of which vertices both exist and are in range of the queried point, not a list of them. It would combine well with doing some processing on data associated with the points though.
这也缩小到D = 3当然,只是没有点> = 8有效。 D> 4可以类似地完成,但是它需要更多的代码,因为这真的是一个强力解决方案,由于并行性只是快速,它从根本上慢慢地在D中慢下来。
This also scales down to D=3 of course, just make none of the points >= 8 valid. D>4 can be done similarly but it takes more code then, and since this is really a brute force solution that is only fast due to parallelism it fundamentally gets slower exponentially in D.
这篇关于Hamming多维数据集的顶点上的查询点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!