更快的方法来计算的项目出现在集数？ [英] Faster way to count number of sets an item appears in?

查看：91 发布时间：2015/11/30 20:52:12 c# algorithm optimization hashset

本文介绍了更快的方法来计算的项目出现在集数？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有书签列表。每个书签都有关键字列表（如HashSet的存储）。我也有一个集中的所有可能的关键字（宇宙）。

I've got a list of bookmarks. Each bookmark has a list of keywords (stored as a HashSet). I also have a set of all possible keywords ("universe").

我想找到出现在最书签的关键字。

I want to find the keyword that appears in the most bookmarks.

我有1356书签与698539关键字的总和，与187358独特的。

I have 1356 bookmarks with a combined total of 698,539 keywords, with 187,358 unique.

如果我遍历宇宙中的每一个关键字和计数它出现在书签的数目，我做254057448检查。这在我的机器需要35秒。

If I iterate through every keyword in the universe and count the number of bookmarks it appears in, I'm doing 254,057,448 checks. This takes 35 seconds on my machine.

该算法是pretty的简单：

The algorithm is pretty simple:

var biggest = universe.MaxBy(kw => bookmarks.Count(bm => bm.Keywords.Contains(kw)));

使用乔恩斯基特的MaxBy 。

我不知道这是可能的多加快速度了，但有什么我可以做什么？也许并行它在某种程度上？

I'm not sure it's possible to speed this up much, but is there anything I can do? Perhaps parallelize it somehow?

DTB的解决方案，需要双方建立宇宙和寻找最大的元素。就这么简单。

dtb's solution takes under 200 ms to both build the universe and find the biggest element. So simple.

var freq = new FreqDict();
foreach(var bm in bookmarks) {
    freq.Add(bm.Keywords);
}
var biggest2 = freq.MaxBy(kvp => kvp.Value);

FreqDict 只是一个小类我做了建立在词典＆LT之上;串，INT和GT;

FreqDict is just a little class I made built on top of a Dictionary<string,int>.

推荐答案

我没有你样的数据我也没有做任何的基准，但我会采取刺伤。这可以改进的一个问题是，大部分的 bm.Keywords.Contains（KW）检查是失误，我觉得这些是可以避免的。最约束是组关键字中的任何一个给定的书签具有（即：它通常比宇宙小得多）。所以我们应该在那个方向，而不是其他方式启动

I don't have your sample data nor have I done any benchmarking, but I'll take a stab. One problem that could be improved upon is that most of the bm.Keywords.Contains(kw) checks are misses, and I think those can be avoided. The most constraining is the set of keywords any one given bookmark has (ie: it will typically be much smaller than universe) so we should start in that direction instead of the other way.

我在想沿着这些路线的东西。内存的要求要高得多，因为我没有什么基准，它可能会比较慢，或者没有帮助，但我只是删除我的答案，如果它不工作了你。

I'm thinking something along these lines. The memory requirement is much higher and since I haven't benchmarked anything, it could be slower, or not helpful, but I'll just delete my answer if it doesn't work out for you.

Dictionary<string, int> keywordCounts = new Dictionary<string, int>(universe.Length);
foreach (var keyword in universe)
{
    keywordCounts.Add(keyword, 0);
}

foreach (var bookmark in bookmarks)
{
    foreach (var keyword in bookmark.Keywords)
    {
        keywordCounts[keyword] += 1;
    }
}

var mostCommonKeyword = keywordCounts.MaxBy(x => x.Value).Key;

这篇关于更快的方法来计算的项目出现在集数？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

更快的方法来计算的项目出现在集数？ [英] Faster way to count number of sets an item appears in?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

更快的方法来计算的项目出现在集数？ [英] Faster way to count number of sets an item appears in?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭