根据功能结果查询项目清单 [英] Query list for items based on function result

查看:60
本文介绍了根据功能结果查询项目清单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Zip个项目(ZipCodes)的列表,其中包含每个ZCTA(邮政编码列表区域)的Location,目前该列表中约有33,000个项目.

我可以包含我的Zip类,但是我认为唯一需要注意的是它包含一个名为LocationLatLong项,其中包含纬度和经度坐标. Haversine()接受LatLong个项目,并执行一些魔术操作以返回双精度值.

我正在尝试将5个最接近的邮政编码(包括距离)拉到我提供的邮政编码中.这是我目前的解决方案(不要介意我手动添加了5个空KVP):

//don't judge me... I'm still working on a better solution here
private static readonly KeyValuePair<Zip, double> init = new KeyValuePair<Zip, double>(null, 9999);
private static readonly List<KeyValuePair<Zip, double>> workstack = new List<KeyValuePair<Zip, double>>
    {
       init, init, init, init, init
    };

private static KeyValuePair<Zip, double>[] FindClosest(Zip myZip)
{
    var closestList = workstack.ToArray(); //I said don't judge me :(
    //fwiw ^ is actually faster than initializing a new array each cycle

    foreach (var zip in ZipCodes.Where(x => x != myZip))
    {
        //Haversine magic returns distance (double) in km
        var dist = Haversine(myZip.Location, zip.Location);
        //If everything else is smaller, just skip it
        if (closestList.All(x => x.Value < dist)) continue;
        closestList = closestList.OrderByDescending(x => x.Value).ToArray();
        closestList[0] = new KeyValuePair<Zip, double>(zip, dist);
    }

    return closestList;
}

但是,我想编写出尽可能高效的代码(实际上我还不能100%地确定应用程序将是什么),所以(我相信)我想使用LINQ./p>

我更改了代码以仅获取最接近的邮政编码,ReSharper建议使用我可以使用的LINQ查询.我对LINQ并没有非常的经验,但是我能够对其进行重组以适合我所需的任务:

//the Skip(1) is to skip the first element, which would be the distance between the zipcode and itself
var closest = ZipCodes.Select(x => new KeyValuePair<Zip, double>
          (x, Haversine(myZip.Location, x.Location))).OrderBy
          (x => x.Value).Skip(1).Take(5).ToArray();

然后,我使用Stopwatch对这两个函数进行计时,以处理500个Zip项目,发现使用LINQ方法平均需要大约 11.25s ,而我原来的foreach方法仅需要花费平均 8s (每500项产品的LINQ降低了 3.25s).

同样,我对LINQ知之甚少,但我总是被认为是它的速度更快.在这种情况下,我明白了为什么不是-我正在尝试对33,000个商品的完整列表进行排序.

如何编写查询以提高效率?或者,通常,我如何编写一个更有效的查询,以根据列表与指定项目和列表其余部分之间的关​​系从列表中提取指定数量的项目?

解决方案

对于您尝试做的事情,LINQ可能不是最佳解决方案.但是,我认为通过使用SortedList而不是数组,您的foreach可以得到一些改善:

private static SortedList<double, Zip> FindClosest(Zip myZip)
{
    var closestZips = new SortedList<double, Zip>();
    List<Zip> ZipCodes = new List<Zip>();
    foreach (var zip in ZipCodes.Where(x => x != myZip))
    {
        //Haversine magic returns distance (double) in km
        double dist = Haversine(myZip.Location, zip.Location);
        //If everything else is smaller, just skip it
        if (closestZips.Count < 5)
        {
            closestZips.Add(dist, zip);
        }
        else if (dist < closestZips.Keys[4])
        {
            closestZips.RemoveAt(4);
            closestZips.Add(dist, zip);
        }
    }

    return closestZips;
}

意识到存在一个错误.修复它,但必须反转键,值.所以现在每个距离都是关键.

LINQ并不会真正导致短路.由于只需要总数的一小部分,LINQ通常效率很低,因为它必须先创建整个集合,然后对其进行排序,然后选择所需的数量.我认为,LINQ在这里的主要优点是简洁明了且可读性强.而且,我认为,有了更有效的foreach循环,foreach仍然会非常有利地出现.

进一步优化

您可以尝试使用Parallel库.结果不是100%一致,但是在10-30%左右有明显的速度增益.

using System.Threading;
using System.Threading.Tasks;

private static Object thisLock = new Object();
private static SortedList<double, Zip> FindClosest2(Zip myZip)
{


    var closestZips = new SortedList<double, Zip>();
    Parallel.ForEach(ZipCodes, (zip) =>
     {
         //Haversine magic returns distance (double) in km
         double dist = Haversine(myZip.Location, zip.Location);
         if (closestZips.Count() < 6)
         {
             lock(thisLock)
             {
                 closestZips.Add(dist, zip);
             }

         }
         else if (dist < closestZips.Keys[4])
         {
             lock(thisLock)
             {
                 closestZips.RemoveAt(4);
                 closestZips.Add(dist, zip);
             }

         }
     });

    return closestZips;
}

这是我用过的Haversine:

public static class Haversine
{
    public static double calculate(double lat1, double lon1, double lat2, double lon2)
    {
        var R = 6372.8; // In kilometers
        var dLat = toRadians(lat2 - lat1);
        var dLon = toRadians(lon2 - lon1);
        lat1 = toRadians(lat1);
        lat2 = toRadians(lat2);

        var a = Math.Sin(dLat / 2) * Math.Sin(dLat / 2) + Math.Sin(dLon / 2) * Math.Sin(dLon / 2) * Math.Cos(lat1) * Math.Cos(lat2);
        var c = 2 * Math.Asin(Math.Sqrt(a));
        return R * 2 * Math.Asin(Math.Sqrt(a));
    }
    public static double calculate(Coords a, Coords b)
    {
        return calculate(a.lat, a.lng, b.lat, b.lng);
    }
    public static double toRadians(double angle)
    {
        return Math.PI * angle / 180.0;
    }
}

I have a list of Zip items (ZipCodes) which contain the Location of each ZCTA (Zip Code Tabulated Area), and there are ~33,000 items in said list at present.

I could include my Zip class, but I think the only thing of note is that it contains a LatLong item named Location which holds the latitude and longitude coordinates. Haversine() accepts LatLong items and does some magic stuff to return a double.

I'm trying to pull the 5 closest zip codes (with distances) to the one that I provide. This is my current solution (don't mind that I manually added the 5 null KVPs):

//don't judge me... I'm still working on a better solution here
private static readonly KeyValuePair<Zip, double> init = new KeyValuePair<Zip, double>(null, 9999);
private static readonly List<KeyValuePair<Zip, double>> workstack = new List<KeyValuePair<Zip, double>>
    {
       init, init, init, init, init
    };

private static KeyValuePair<Zip, double>[] FindClosest(Zip myZip)
{
    var closestList = workstack.ToArray(); //I said don't judge me :(
    //fwiw ^ is actually faster than initializing a new array each cycle

    foreach (var zip in ZipCodes.Where(x => x != myZip))
    {
        //Haversine magic returns distance (double) in km
        var dist = Haversine(myZip.Location, zip.Location);
        //If everything else is smaller, just skip it
        if (closestList.All(x => x.Value < dist)) continue;
        closestList = closestList.OrderByDescending(x => x.Value).ToArray();
        closestList[0] = new KeyValuePair<Zip, double>(zip, dist);
    }

    return closestList;
}

However, I'd like to write this to be as efficient as possible (I'm not actually 100% sure what the application will be just yet), so (I believe) I'd like to use LINQ.

I altered my code to just grab just the closest zip code, and ReSharper suggested a LINQ query that I was able to work with. I'm not terribly experienced with LINQ, but I was able to restructure it to fit my desired task:

//the Skip(1) is to skip the first element, which would be the distance between the zipcode and itself
var closest = ZipCodes.Select(x => new KeyValuePair<Zip, double>
          (x, Haversine(myZip.Location, x.Location))).OrderBy
          (x => x.Value).Skip(1).Take(5).ToArray();

I then used the Stopwatch to time both functions to process 500 Zip items, and found that using the LINQ method took around 11.25s on average, while my original foreach method took only 8s on average (LINQ was slower by 3.25s per 500 items).

Again, I don't know much about LINQ, but I was always led to believe it was faster. In this case, I can see why it isn't - I'm trying to sort a full list of 33,000 items.

How could I write my query to be more efficient? Or, in general, how could I write a more efficient query to pull a specified number of items from a list, based on their relation to a given item and the rest of the list?

解决方案

For what you're trying to do LINQ might not be the best solution. However, I think your foreach could be improved a fair bit, by using a SortedList instead of an array:

private static SortedList<double, Zip> FindClosest(Zip myZip)
{
    var closestZips = new SortedList<double, Zip>();
    List<Zip> ZipCodes = new List<Zip>();
    foreach (var zip in ZipCodes.Where(x => x != myZip))
    {
        //Haversine magic returns distance (double) in km
        double dist = Haversine(myZip.Location, zip.Location);
        //If everything else is smaller, just skip it
        if (closestZips.Count < 5)
        {
            closestZips.Add(dist, zip);
        }
        else if (dist < closestZips.Keys[4])
        {
            closestZips.RemoveAt(4);
            closestZips.Add(dist, zip);
        }
    }

    return closestZips;
}

Realized there was a bug. got it fixed, but had to reverse keys,values. So now each distance is the key.

LINQ doesn't really lend itself to short circuiting. Since you only want a small percentage of the total, LINQ will usually be quite inefficient, since it has to create the whole collection first, then sort it, then select the amount you want. The main advantage LINQ would have here, I think, is it will be concise and more readable. Also, I think, with a more efficient foreach loop, the foreach will still come out very favourably.

Edit: Further optimization

You could try using the Parallel library. The result weren't 100% consistent but there was a definite speed gain at around 10 - 30%.

using System.Threading;
using System.Threading.Tasks;

private static Object thisLock = new Object();
private static SortedList<double, Zip> FindClosest2(Zip myZip)
{


    var closestZips = new SortedList<double, Zip>();
    Parallel.ForEach(ZipCodes, (zip) =>
     {
         //Haversine magic returns distance (double) in km
         double dist = Haversine(myZip.Location, zip.Location);
         if (closestZips.Count() < 6)
         {
             lock(thisLock)
             {
                 closestZips.Add(dist, zip);
             }

         }
         else if (dist < closestZips.Keys[4])
         {
             lock(thisLock)
             {
                 closestZips.RemoveAt(4);
                 closestZips.Add(dist, zip);
             }

         }
     });

    return closestZips;
}

Here's the Haversine I used:

public static class Haversine
{
    public static double calculate(double lat1, double lon1, double lat2, double lon2)
    {
        var R = 6372.8; // In kilometers
        var dLat = toRadians(lat2 - lat1);
        var dLon = toRadians(lon2 - lon1);
        lat1 = toRadians(lat1);
        lat2 = toRadians(lat2);

        var a = Math.Sin(dLat / 2) * Math.Sin(dLat / 2) + Math.Sin(dLon / 2) * Math.Sin(dLon / 2) * Math.Cos(lat1) * Math.Cos(lat2);
        var c = 2 * Math.Asin(Math.Sqrt(a));
        return R * 2 * Math.Asin(Math.Sqrt(a));
    }
    public static double calculate(Coords a, Coords b)
    {
        return calculate(a.lat, a.lng, b.lat, b.lng);
    }
    public static double toRadians(double angle)
    {
        return Math.PI * angle / 180.0;
    }
}

这篇关于根据功能结果查询项目清单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆