如何优化每个不同的值多键的字典? [英] How to optimize a dictionary with multiple keys per distinct value?

查看:274
本文介绍了如何优化每个不同的值多键的字典?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个比较大的数据集是借自己非常自然的C#的字典对象。目前,我正在生成的半动态102400键值对,当我的程序启动。我的问题是,我要以最快的速度运行查找操作的大量越好。

<据本页中的查找的速度是直接通过键盘的数目的影响值对的字典。我的数据是一个有点奇怪的是不同的密钥大量导致相同的值。事实上,我只有4900不同的值。这意味着,我对每个不同的值平均20键 - 值对。

我的第一反应是交换的值的键(因为我只关心数据的不同的值),则有旧键列表或数组作为新的价值观。这降低了我的字典大小为4900,从102400键 - 值对,但我看不到任何的方式来有效地搜索所有列表中的特定值获得通过的关键。

我知道,我的描述可能会变得有点难治,因为我切换键和值可循,所以我已经包含了实体模型我的数据给大家展示一下我的意思是:

老方法:

 核心价值
--- -----
1 1
2 2
3 3
4 1
5 3
6 2
7 2
8 1
9 3
10 2
11 3
12 1
 

新结构:

 核心价值
--- -----
{1} -1,4,8,12-
2 {2,6,7,10}
3 {3,9,5,11}
 

在我的计划,我将给予'11',我会需要返回3。第一个结构是一个简单的查找,但它是一个巨大的名单,这似乎正在放缓下来......第二增添了非常多的逻辑开销追查哪个值列表我在寻找,我只看到了降低加快努力实现它。

我是不是找错了树在这里?如果我只是接受了大名单的速度,或者是有,我可以存储我的数据的其他方法来提高查找速度?

解决方案

如果所有的按键都是独特的,连续的,那么你应该考虑一个简单的数组;如果键是不连续的,那么散列映射结构的类型,如果他们不。这将接近O(1)如果散列函数是好的,如果他们都是整数,应​​该不会占用太多的空间。

即使这样,对于102400元,二叉树的查找将采取最多LOG2(102400),每个查询是16.64操作,操作不完全是缓慢的。

I have a relatively large set of data that lends itself very naturally to the c#'s dictionary object. Currently, I have 102400 key-value pairs that are being generated semi-dynamically when my program starts up. My problem is that I have to run a great number of lookup operations as fast as possible.

According to This Page the speed of the lookups is directly influenced by the number of key-value pairs in the dictionary. My data is a bit odd in that a great number of different keys lead to the same value. In fact, I have only 4900 distinct values... This means that I have an average of 20 key-value pairs for each distinct value.

My first instinct was to swap the keys for the values (as I only care about the distinct values in the data) then have the old keys in a list or array as the new values. This reduced my dictionary size to 4900 from 102400 key-value pairs, but I can't see any way to efficiently search all of the lists for a specific value to get the key by.

I know that my description probably got a bit dificult to follow as I switched the keys and values, so I've included a mock-up of my data to show you what I mean:

old method:

Key   Value
---   -----
1     1
2     2
3     3
4     1
5     3
6     2
7     2
8     1
9     3
10    2
11    3
12    1

New structure:

Key   Value
---   -----
1     {1,4,8,12}
2     {2,6,7,10}
3     {3,9,5,11}

In my program, I'm going to be given '11' and I'll need to return '3'. The first structure is a simple lookup, but is a huge list which seems to be slowing things down... the second adds so much logical overhead to track down which value list I'm looking for that I've only seen a reduction in speed trying to implement it.

Am I barking up the wrong tree here? Should I just accept the speed of the larger list, or is there some other way that I can store my data to increase the lookup speed?

解决方案

If all keys are distinct and contiguous, then you should consider a simple array; if the keys aren't contiguous, then a hash map type of structure if they aren't. This would be approaching O(1) if the hashing function is good, and if they are all integers, shouldn't take up much space.

Even then, for 102400 elements, a binary tree lookup would take at most log2(102400) operations per lookup which is 16.64 operations, not exactly slow.

这篇关于如何优化每个不同的值多键的字典?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆