计算未排序数据中唯一对和非唯一对实例的数量 [英] Counting number of unique pairs and instances of non-unique pairs in unsorted data

查看：64 发布时间：2020/6/3 21:22:46 c algorithm

本文介绍了计算未排序数据中唯一对和非唯一对实例的数量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下形式的数据：

ID   ATTR
3    10
1    20
1    20
4    30
...  ...

ID和Attr未排序的地方，可能包含重复项。 ID的范围是1-20,000左右，并且ATTR是unsigned int。我可能一次需要处理10万至50万对。

Where ID and Attr are unsorted and may contain duplicates. The range for the IDs are 1-20,000 or so, and ATTR are unsigned int. There may be anywhere between 100,000 and 500,000 pairs that I need to process at a single time.

我正在寻找：

唯一对的数量。

非唯一对弹出的次数。

因此在上述数据中，我想知道（1,20）出现了两次并且有3对唯一的对。

So in the above data, I'd want to know that (1,20) appeared twice and that there were 3 unique pairs.

我目前在以幼稚的方式使用哈希表。我保留一个唯一对的计数器，如果要插入的项已经存在，则递减计数器。我还保留了一组非唯一ID的ID。（所有初次接触）

I'm currently using a hash table in my naive approach. I keep a counter of unique pairs, and decrement the counter if the item I am inserting is already there. I also keep an array of IDs of the non-unique pairs. (All on first encounters)

性能和尺寸都受到同等关注。考虑到性能和尺寸方面的问题，我确实可以接受较高的误报率（例如0.5％）。（我也使用频谱绽放实现了这一点）

Performance and size are about equal concerns. I'm actually OK with a relatively high (say 0.5%) rate of false positives given the performance and size concerns. (I've also implemented this using a spectral bloom)

我并不聪明，所以我敢肯定那里有更好的解决方案，我会想听听您最喜欢的哈希表实现/其他想法。：）

I'm not that smart, so I'm sure there's a better solution out there, and I'd like to hear about your favorite hash table implementations/any other ideas. :)

计算未排序数据中唯一对和非唯一对实例的数量 [英] Counting number of unique pairs and instances of non-unique pairs in unsorted data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算未排序数据中唯一对和非唯一对实例的数量 [英] Counting number of unique pairs and instances of non-unique pairs in unsorted data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭