我有一个包含超过一百万个对象的列表,试图找到最快的搜索方式 [英] I have a list with over a million objects in it, trying to find the fastest way to search through it

查看:61
本文介绍了我有一个包含超过一百万个对象的列表,试图找到最快的搜索方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表,其中存储着超过一百万个对象.我需要仔细浏览列表,并尽可能高效地更新通过以下查询找到的对象.

I have a list that stores well over a million objects within it. I need to look through the list and update the objects found through the below query as efficiently as possible.

我当时正在考虑使用Dictionary或HashSet,但是我对C#还是比较陌生,无法弄清楚如何实现这两种其他方法.我当前的代码只是通过IList搜索的LINQ语句.

I was thinking of using a Dictionary or HashSet, but I'm relatively new to C# and could not figure out how to implement these other two approaches. My current code is simply a LINQ statement searching through an IList.

public IList<LandObject> landObjects = new List<LandObject>();    

var lObjsToUpdate = from obj in landObjects 
                        where 
            obj.position.x >= land.x - size.x && 
            obj.position.x <= land.x + size.x && 
            obj.position.y >= land.y - size.y && 
            obj.position.y <= land.y + size.y
                    select obj;

 foreach(var item in lObjsToUpdate)
 {
     //do what I need to do with records I've found
 }

有人能建议我如何有效地解决这个问题吗?

Could anyone be so kind as to suggest how I might approach this efficiently?

推荐答案

理想情况下,您需要一种对元素进行分区的方法,这样您就不必测试每个元素就可以找到适合的元素和应该抛出的元素.出去.分区的方式将取决于项目的密度-例如,可以简单地在 X 坐标的整数部分上进行分区,或者根据该坐标的某个合适的缩放值进行分区.

Ideally you need a way to partition elements so that you don't have to test every single one to find the ones that fit and the ones that should be thrown out. How you partition will depend on how dense the items are - it might be as simple as partitioning on the integer portion of the X coordinate for instance, or by some suitable scaled value of that coordinate.

给出一个采用 X 坐标并为其返回分区值的方法(现在将其称为 Partition ),您可以在 X上进行过滤作为第一遍,可以相当迅速地进行协调,以减少您需要检查的节点总数.您可能需要稍微使用一下分区功能才能获得正确的分配.

Given a method (let's call it Partition for now) that takes an X coordinate and returns a partition value for it, you can filter on the X coordinate fairly quickly as a first-pass to reduce the total number of nodes you need to check. You might need to play with the partition function a little to get the right distribution though.

例如,假设您的浮点坐标为 -100< -100.X< = 100 ,并且您的1,000,000+个对象在该范围内相当均匀地分布.如果按 X 的整数值将列表分成200个分区(平均)5000个条目.这意味着对于您搜索范围的 X 维度中的每个整数步,您只有大约5,000个要测试的条目.

For example, say that you have floating-point coordinates in the range -100 < X <= 100, with your 1,000,000+ objects distributed fairly uniformly across that range. That would divide the list into 200 partitions of (on average) 5000 entries if partitioned on integer values of X. That means that for every integer step in the X dimension of your search range you only have ~5,000 entries to test.

以下是一些代码:

public interface IPosition2F
{
    float X { get; }
    float Y { get; }
}

public class CoordMap<T> where T : IPosition2F
{
    SortedDictionary<int, List<T>> map = new SortedDictionary<int,List<T>>();
    readonly Func<float, int> xPartition = (x) => (int)Math.Floor(x);

    public void Add(T entry)
    {
        int xpart = xPartition(entry.X);
        List<T> col;
        if (!map.TryGetValue(xpart, out col))
        {
            col = new List<T>();
            map[xpart] = col;
        }
        col.Add(entry);
    }

    public T[] ExtractRange(float left, float top, float right, float bottom)
    {
        var rngLeft = xPartition(left) - 1;
        var rngRight = xPartition(right) + 1;

        var cols =
            from keyval in map
            where keyval.Key >= rngLeft && keyval.Key <= rngRight
            select keyval.Value;

        var cells =
            from cell in cols.SelectMany(c => c)
            where cell.X >= left && cell.X <= right &&
                cell.Y >= top && cell.Y <= bottom
            select cell;

        return cells.ToArray();
    }

    public CoordMap()
    { }

    // Create instance with custom partition function
    public CoordMap(Func<float, int> partfunc)
    {
        xPartition = partfunc;
    }
}

这将在 X 坐标上进行分区,从而减少了最终的搜索空间.如果您想更进一步,还可以在 Y 坐标上进行分区...我将其留给读者练习:)

That will partition on the X coordinate, reducing your final search space. If you wanted to take it a step further you could also partition on the Y coordinate... I'll leave that as an exercise for the reader :)

如果您的分区函数的粒度非常精细,并且可能导致大量分区,则添加类似于以下内容的 ColumnRange 函数可能会很有用:

If your parition function is very finely grained and could result in a large number of partitions, it might be useful to add a ColumnRange function similar to:

    public IEnumerable<List<T>> ColumnRange(int left, int right)
    {
        using (var mapenum = map.GetEnumerator())
        {
            bool finished = mapenum.MoveNext();
            while (!finished && mapenum.Current.Key < left)
                finished = mapenum.MoveNext();

            while (!finished && mapenum.Current.Key <= right)
            {
                yield return mapenum.Current.Value;
                finished = mapenum.MoveNext();
            }

        }
    }

然后, ExtractRange 方法可以像这样使用

The ExtractRange method can then use that like so:

    public T[] ExtractRange(float left, float top, float right, float bottom)
    {
        var rngLeft = xPartition(left) - 1;
        var rngRight = xPartition(right) + 1;

        var cells =
            from cell in ColumnRange(rngLeft, rngRight).SelectMany(c => c)
            where cell.X >= left && cell.X <= right &&
                cell.Y >= top && cell.Y <= bottom
            select cell;

        return cells.ToArray();
    }

为了方便起见,我使用了 SortedDictionary ,并且因为它可以执行相当快的 ExtractRange 方法.还有其他一些容器类型可能更适合该任务.

I used SortedDictionary for convenience, and because it makes it possible to do an ExtractRange method that is reasonably quick. There are other container types that are possibly better suited to the task.

这篇关于我有一个包含超过一百万个对象的列表,试图找到最快的搜索方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆