更快地建立直方图 [英] Building a histogram faster
问题描述
我正在处理需要建立直方图的大型数据集。我觉得我只是遍历整个列表并在第二个数组中标记频率的方法是一种缓慢的方法。关于如何加快该过程的任何建议?
I am working with a large dataset that I need to build a histogram of. I feel like my method of just going through the entire list and marking in a second array the frequency is a slow approach. Any suggestions on how to speed the process up?
推荐答案
鉴于直方图是一个包含每个项目中所有项目计数的图形bin,您不能不访问所有物品就做成一个。
Given that a histogram is a graph containing the counts of all items in each bin, you can't make one without visiting all the items.
但是,您可以:
-
在收集数据时创建直方图。
Create the histogram as you collect the data. Then it takes no time to generate.
将数据分解为N个部分,并并行处理每个部分。完成每个部分的计数后,只需将每个仓的结果求和即可。 (您也可以将其与#1结合使用)
Break up the data into N parts, and work on each part in parallel. When each part is done counting, just sum the results for each bin. (You can also combine this with #1)
对数据进行采样。从理论上讲,查看一部分数据,您应该能够估计其余数据。 数学。
Sample the data. In theory, looking at a fraction of your data, you should be able to estimate the rest of it. The Math.
这篇关于更快地建立直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!