我有一个包含超过一百万个对象的列表,试图找到最快的搜索方式 [英] I have a list with over a million objects in it, trying to find the fastest way to search through it
问题描述
我有一个列表,其中存储着超过一百万个对象.我需要仔细浏览列表,并尽可能高效地更新通过以下查询找到的对象.
I have a list that stores well over a million objects within it. I need to look through the list and update the objects found through the below query as efficiently as possible.
我当时正在考虑使用Dictionary或HashSet,但是我对C#还是比较陌生,无法弄清楚如何实现这两种其他方法.我当前的代码只是通过IList搜索的LINQ语句.
I was thinking of using a Dictionary or HashSet, but I'm relatively new to C# and could not figure out how to implement these other two approaches. My current code is simply a LINQ statement searching through an IList.
public IList<LandObject> landObjects = new List<LandObject>();
var lObjsToUpdate = from obj in landObjects
where
obj.position.x >= land.x - size.x &&
obj.position.x <= land.x + size.x &&
obj.position.y >= land.y - size.y &&
obj.position.y <= land.y + size.y
select obj;
foreach(var item in lObjsToUpdate)
{
//do what I need to do with records I've found
}
有人能建议我如何有效地解决这个问题吗?
Could anyone be so kind as to suggest how I might approach this efficiently?
推荐答案
理想情况下,您需要一种对元素进行分区的方法,这样您就不必测试每个元素就可以找到适合的元素和应该抛出的元素.出去.分区的方式将取决于项目的密度-例如,可以简单地在 X
坐标的整数部分上进行分区,或者根据该坐标的某个合适的缩放值进行分区.
Ideally you need a way to partition elements so that you don't have to test every single one to find the ones that fit and the ones that should be thrown out. How you partition will depend on how dense the items are - it might be as simple as partitioning on the integer portion of the X
coordinate for instance, or by some suitable scaled value of that coordinate.
给出一个采用 X
坐标并为其返回分区值的方法(现在将其称为 Partition
),您可以在 X上进行过滤
作为第一遍,可以相当迅速地进行协调,以减少您需要检查的节点总数.您可能需要稍微使用一下分区功能才能获得正确的分配.
Given a method (let's call it Partition
for now) that takes an X
coordinate and returns a partition value for it, you can filter on the X
coordinate fairly quickly as a first-pass to reduce the total number of nodes you need to check. You might need to play with the partition function a little to get the right distribution though.
例如,假设您的浮点坐标为 -100< -100.X< = 100
,并且您的1,000,000+个对象在该范围内相当均匀地分布.如果按 X
的整数值将列表分成200个分区(平均)5000个条目.这意味着对于您搜索范围的 X
维度中的每个整数步,您只有大约5,000个要测试的条目.
For example, say that you have floating-point coordinates in the range -100 < X <= 100
, with your 1,000,000+ objects distributed fairly uniformly across that range. That would divide the list into 200 partitions of (on average) 5000 entries if partitioned on integer values of X
. That means that for every integer step in the X
dimension of your search range you only have ~5,000 entries to test.
以下是一些代码:
public interface IPosition2F
{
float X { get; }
float Y { get; }
}
public class CoordMap<T> where T : IPosition2F
{
SortedDictionary<int, List<T>> map = new SortedDictionary<int,List<T>>();
readonly Func<float, int> xPartition = (x) => (int)Math.Floor(x);
public void Add(T entry)
{
int xpart = xPartition(entry.X);
List<T> col;
if (!map.TryGetValue(xpart, out col))
{
col = new List<T>();
map[xpart] = col;
}
col.Add(entry);
}
public T[] ExtractRange(float left, float top, float right, float bottom)
{
var rngLeft = xPartition(left) - 1;
var rngRight = xPartition(right) + 1;
var cols =
from keyval in map
where keyval.Key >= rngLeft && keyval.Key <= rngRight
select keyval.Value;
var cells =
from cell in cols.SelectMany(c => c)
where cell.X >= left && cell.X <= right &&
cell.Y >= top && cell.Y <= bottom
select cell;
return cells.ToArray();
}
public CoordMap()
{ }
// Create instance with custom partition function
public CoordMap(Func<float, int> partfunc)
{
xPartition = partfunc;
}
}
这将在 X
坐标上进行分区,从而减少了最终的搜索空间.如果您想更进一步,还可以在 Y
坐标上进行分区...我将其留给读者练习:)
That will partition on the X
coordinate, reducing your final search space. If you wanted to take it a step further you could also partition on the Y
coordinate... I'll leave that as an exercise for the reader :)
如果您的分区函数的粒度非常精细,并且可能导致大量分区,则添加类似于以下内容的 ColumnRange
函数可能会很有用:
If your parition function is very finely grained and could result in a large number of partitions, it might be useful to add a ColumnRange
function similar to:
public IEnumerable<List<T>> ColumnRange(int left, int right)
{
using (var mapenum = map.GetEnumerator())
{
bool finished = mapenum.MoveNext();
while (!finished && mapenum.Current.Key < left)
finished = mapenum.MoveNext();
while (!finished && mapenum.Current.Key <= right)
{
yield return mapenum.Current.Value;
finished = mapenum.MoveNext();
}
}
}
然后, ExtractRange
方法可以像这样使用
The ExtractRange
method can then use that like so:
public T[] ExtractRange(float left, float top, float right, float bottom)
{
var rngLeft = xPartition(left) - 1;
var rngRight = xPartition(right) + 1;
var cells =
from cell in ColumnRange(rngLeft, rngRight).SelectMany(c => c)
where cell.X >= left && cell.X <= right &&
cell.Y >= top && cell.Y <= bottom
select cell;
return cells.ToArray();
}
为了方便起见,我使用了 SortedDictionary
,并且因为它可以执行相当快的 ExtractRange
方法.还有其他一些容器类型可能更适合该任务.
I used SortedDictionary
for convenience, and because it makes it possible to do an ExtractRange
method that is reasonably quick. There are other container types that are possibly better suited to the task.
这篇关于我有一个包含超过一百万个对象的列表,试图找到最快的搜索方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!