排序二维清单python [英] Sort two dimensional list python

查看:71
本文介绍了排序二维清单python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的二维列表

I have a two dimensional list like this

a = [[42, 206], [45, 40], [45, 205], [46, 41], [46, 205], [47, 40], [47, 202], [48, 40], [48, 202], [49, 38]]

实际上,它们是2D欧式空间中的坐标.我想对它进行排序,以使接近的点按顺序排列.因此,列表如下所示

Actually these are coordinates in 2D-Euclidean space. I want to sort it like in a way that close points come in a sequence. So, the list looks like the following

sorted_a = [[45,205],[42,206],[46,205],[47,202],[48,202],[45,40],[46,41],[47,40],[48,40],[49,38]]

我也使用了该方法

sorted_a = sorted(a, key=lambda x: (x[0],x[1]))

但它没有返回我所需的结果.感谢您的帮助.谢谢

but it is not returning me required results. Your help is appreciated. Thanks

推荐答案

我不确定这是一个排序问题;更多的是分组(或优化?)

I'm not sure this is a sorting problem; it's more of a grouping one (or optimization?)

排序需要一些标准,才能将[45,205]列表放在[42,206]之前.如果您能给出一个代表所需顺序的数字,则key有效.

Sorting requires some criteria for putting the [45,205] list before [42,206]. key works if you can come up with one number that represents the desired order.

例如计算距原点的距离

A = np.array(a)创建一个numpy数组:

A = np.array(a) creates a numpy array:

In [346]: A
Out[346]: 
array([[ 42, 206],
       [ 45,  40],
       [ 45, 205],
       [ 46,  41],
       [ 46, 205],
       [ 47,  40],
       [ 47, 202],
       [ 48,  40],
       [ 48, 202],
       [ 49,  38]])

极坐标中的

距离或半径是平方和(不需要sqrt).将argsort应用于此点将按与原点的距离对这些点进行排名.

distance or radius in polar coordinates is sum of squares (sqrt isn't needed for this purpose). Applying argsort to this ranks the points by distance from origin.

In [347]: np.sum(A**2,axis=1)
Out[347]: array([44200,  3625, 44050,  3797, 44141,  3809, 43013,  3904, 43108,  3845])
In [348]: r = np.sum(A**2,axis=1)
In [349]: idx = np.argsort(r)
In [350]: idx
Out[350]: array([1, 3, 5, 9, 7, 6, 8, 2, 4, 0], dtype=int32)
In [351]: A[idx,:]
Out[351]: 
array([[ 45,  40],
       [ 46,  41],
       [ 47,  40],
       [ 49,  38],
       [ 48,  40],
       [ 47, 202],
       [ 48, 202],
       [ 45, 205],
       [ 46, 205],
       [ 42, 206]])

等效于列表的操作使用了像这样的键功能

The list equivalent operation uses a key function like

def foo(xy):
    x,y=xy
    return x**2+y**2
In [356]: sorted(a, key=foo)
Out[356]: 
[[45, 40],
 [46, 41],
 [47, 40],
 [49, 38],
 [48, 40],
 [47, 202],
 [48, 202],
 [45, 205],
 [46, 205],
 [42, 206]]

成对的距离

numpy中,很容易得出两两之间的距离(甚至使用scipy工具之一也更容易).但是你会怎么做呢?是什么定义了基于这种距离的顺序?

Pairwise distances

In numpy it's fairly easy to come up with pairwise distance (even easier with one of the scipy tools). But what would you do with those? What defines order based on such distances?

例如,使用经常被要求矢量化"的迭代类型:

For example to use the kind of iteration that we are often asked to 'vectorize':

In [369]: D = np.zeros((10,10))
In [370]: for i in range(10):
     ...:     for j in range(i,10):
     ...:         D[i,j] = np.sqrt(sum((A[i,:]-A[j,:])**2))
                  # D[i,j] = np.linalg.norm(A[i,:]-A[j,:])

In [372]: D.astype(int)
Out[372]: 
array([[  0, 166,   3, 165,   4, 166,   6, 166,   7, 168],
       [  0,   0, 165,   1, 165,   2, 162,   3, 162,   4],
       [  0,   0,   0, 164,   1, 165,   3, 165,   4, 167],
       [  0,   0,   0,   0, 164,   1, 161,   2, 161,   4],
       [  0,   0,   0,   0,   0, 165,   3, 165,   3, 167],
       [  0,   0,   0,   0,   0,   0, 162,   1, 162,   2],
       [  0,   0,   0,   0,   0,   0,   0, 162,   1, 164],
       [  0,   0,   0,   0,   0,   0,   0,   0, 162,   2],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0, 164],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0]])

是距离矩阵,四舍五入是为了便于显示.

is a matrix of distances, rounded for ease of display.

numpy具有词法排序.我们可以使用它先对第二个坐标进行排序,然后对第一个坐标进行排序.这样会将所有这200个分组在一起:

numpy has a lexical sort. We could use that to sort on the 2nd coordinate first, and then the 1st coor. That would group all those 200's together:

In [375]: np.lexsort(A.T)
Out[375]: array([9, 1, 5, 7, 3, 6, 8, 2, 4, 0], dtype=int32)
In [376]: A[_,:]
Out[376]: 
array([[ 49,  38],
       [ 45,  40],
       [ 47,  40],
       [ 48,  40],
       [ 46,  41],
       [ 47, 202],
       [ 48, 202],
       [ 45, 205],
       [ 46, 205],
       [ 42, 206]])

具有该排序数组的成对距离如下:

pairwise distances with that sorted array look like:

array([[  0,   4,   2,   2,   4, 164, 164, 167, 167, 168],
       [  0,   0,   2,   3,   1, 162, 162, 165, 165, 166],
       [  0,   0,   0,   1,   1, 162, 162, 165, 165, 166],
       [  0,   0,   0,   0,   2, 162, 162, 165, 165, 166],
       [  0,   0,   0,   0,   0, 161, 161, 164, 164, 165],
       [  0,   0,   0,   0,   0,   0,   1,   3,   3,   6],
       [  0,   0,   0,   0,   0,   0,   0,   4,   3,   7],
       [  0,   0,   0,   0,   0,   0,   0,   0,   1,   3],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   4],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0]])

搜索排列

该问题的另一种思考方式是搜索问题,例如寻求找到最小化行进"距离(即连续点之间的距离之和)的点的顺序.

Search over permutations

Another way of thinking of this problem is as a search problem, for example seeking to find the order of points that minimizes the 'travel' distance, i.e. the sum of distances between successive points.

对于原始的a(A),连续点之间的距离(默认为np.linalg.norm方法)为

With the original a (A), the distance (with default np.linalg.norm method) between successive points is

In [407]: np.linalg.norm(A[1:]-A[:-1],axis=1)
Out[407]: 
array([ 166.02710622,  165.        ,  164.00304875,  164.        ,
        165.00303028,  162.        ,  162.00308639,  162.        ,
        164.00304875])

及其总和:

In [408]: _.sum()
Out[408]: 1474.0393203904973

lexsort顺序

In [410]: np.linalg.norm(A1[1:]-A1[:-1],axis=1)
Out[410]: 
array([   4.47213595,    2.        ,    1.        ,    2.23606798,
        161.00310556,    1.        ,    4.24264069,    1.        ,
          4.12310563])
In [411]: _.sum()
Out[411]: 181.07705580534656

显然,这主要是基于第二列的值,它具有更好的聚类.

Clearly this has better clustering, mainly based on the 2nd column values.

您的sorted_a对此总和有所改善:

Your sorted_a improves this sum a bit:

In [414]: sortedA = np.array(sorted_a)
In [415]: np.linalg.norm(sortedA[1:]-sortedA[:-1],axis=1)
Out[415]: 
array([   3.16227766,    4.12310563,    3.16227766,    1.        ,
        162.0277754 ,    1.41421356,    1.41421356,    1.        ,
          2.23606798])
In [416]: _.sum()
Out[416]: 179.53993144488973

一种蛮力的解决方案是尝试所有排列,然后选择一个使该和最小的排列.

A brute force solution is to try all the permutations, and pick the one that minimizes this sum.

这篇关于排序二维清单python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆