排序二维清单python [英] Sort two dimensional list python
问题描述
我有一个这样的二维列表
I have a two dimensional list like this
a = [[42, 206], [45, 40], [45, 205], [46, 41], [46, 205], [47, 40], [47, 202], [48, 40], [48, 202], [49, 38]]
实际上,它们是2D欧式空间中的坐标.我想对它进行排序,以使接近的点按顺序排列.因此,列表如下所示
Actually these are coordinates in 2D-Euclidean space. I want to sort it like in a way that close points come in a sequence. So, the list looks like the following
sorted_a = [[45,205],[42,206],[46,205],[47,202],[48,202],[45,40],[46,41],[47,40],[48,40],[49,38]]
我也使用了该方法
sorted_a = sorted(a, key=lambda x: (x[0],x[1]))
但它没有返回我所需的结果.感谢您的帮助.谢谢
but it is not returning me required results. Your help is appreciated. Thanks
推荐答案
我不确定这是一个排序问题;更多的是分组(或优化?)
I'm not sure this is a sorting problem; it's more of a grouping one (or optimization?)
排序需要一些标准,才能将[45,205]列表放在[42,206]之前.如果您能给出一个代表所需顺序的数字,则key
有效.
Sorting requires some criteria for putting the [45,205] list before [42,206]. key
works if you can come up with one number that represents the desired order.
例如计算距原点的距离
A = np.array(a)
创建一个numpy数组:
A = np.array(a)
creates a numpy array:
In [346]: A
Out[346]:
array([[ 42, 206],
[ 45, 40],
[ 45, 205],
[ 46, 41],
[ 46, 205],
[ 47, 40],
[ 47, 202],
[ 48, 40],
[ 48, 202],
[ 49, 38]])
极坐标中的
距离或半径是平方和(不需要sqrt
).将argsort
应用于此点将按与原点的距离对这些点进行排名.
distance or radius in polar coordinates is sum of squares (sqrt
isn't needed for this purpose). Applying argsort
to this ranks the points by distance from origin.
In [347]: np.sum(A**2,axis=1)
Out[347]: array([44200, 3625, 44050, 3797, 44141, 3809, 43013, 3904, 43108, 3845])
In [348]: r = np.sum(A**2,axis=1)
In [349]: idx = np.argsort(r)
In [350]: idx
Out[350]: array([1, 3, 5, 9, 7, 6, 8, 2, 4, 0], dtype=int32)
In [351]: A[idx,:]
Out[351]:
array([[ 45, 40],
[ 46, 41],
[ 47, 40],
[ 49, 38],
[ 48, 40],
[ 47, 202],
[ 48, 202],
[ 45, 205],
[ 46, 205],
[ 42, 206]])
等效于列表的操作使用了像这样的键功能
The list equivalent operation uses a key function like
def foo(xy):
x,y=xy
return x**2+y**2
In [356]: sorted(a, key=foo)
Out[356]:
[[45, 40],
[46, 41],
[47, 40],
[49, 38],
[48, 40],
[47, 202],
[48, 202],
[45, 205],
[46, 205],
[42, 206]]
成对的距离
在numpy
中,很容易得出两两之间的距离(甚至使用scipy
工具之一也更容易).但是你会怎么做呢?是什么定义了基于这种距离的顺序?
Pairwise distances
In numpy
it's fairly easy to come up with pairwise distance (even easier with one of the scipy
tools). But what would you do with those? What defines order based on such distances?
例如,使用经常被要求矢量化"的迭代类型:
For example to use the kind of iteration that we are often asked to 'vectorize':
In [369]: D = np.zeros((10,10))
In [370]: for i in range(10):
...: for j in range(i,10):
...: D[i,j] = np.sqrt(sum((A[i,:]-A[j,:])**2))
# D[i,j] = np.linalg.norm(A[i,:]-A[j,:])
In [372]: D.astype(int)
Out[372]:
array([[ 0, 166, 3, 165, 4, 166, 6, 166, 7, 168],
[ 0, 0, 165, 1, 165, 2, 162, 3, 162, 4],
[ 0, 0, 0, 164, 1, 165, 3, 165, 4, 167],
[ 0, 0, 0, 0, 164, 1, 161, 2, 161, 4],
[ 0, 0, 0, 0, 0, 165, 3, 165, 3, 167],
[ 0, 0, 0, 0, 0, 0, 162, 1, 162, 2],
[ 0, 0, 0, 0, 0, 0, 0, 162, 1, 164],
[ 0, 0, 0, 0, 0, 0, 0, 0, 162, 2],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 164],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
是距离矩阵,四舍五入是为了便于显示.
is a matrix of distances, rounded for ease of display.
numpy具有词法排序.我们可以使用它先对第二个坐标进行排序,然后对第一个坐标进行排序.这样会将所有这200个分组在一起:
numpy has a lexical sort. We could use that to sort on the 2nd coordinate first, and then the 1st coor. That would group all those 200's together:
In [375]: np.lexsort(A.T)
Out[375]: array([9, 1, 5, 7, 3, 6, 8, 2, 4, 0], dtype=int32)
In [376]: A[_,:]
Out[376]:
array([[ 49, 38],
[ 45, 40],
[ 47, 40],
[ 48, 40],
[ 46, 41],
[ 47, 202],
[ 48, 202],
[ 45, 205],
[ 46, 205],
[ 42, 206]])
具有该排序数组的成对距离如下:
pairwise distances with that sorted array look like:
array([[ 0, 4, 2, 2, 4, 164, 164, 167, 167, 168],
[ 0, 0, 2, 3, 1, 162, 162, 165, 165, 166],
[ 0, 0, 0, 1, 1, 162, 162, 165, 165, 166],
[ 0, 0, 0, 0, 2, 162, 162, 165, 165, 166],
[ 0, 0, 0, 0, 0, 161, 161, 164, 164, 165],
[ 0, 0, 0, 0, 0, 0, 1, 3, 3, 6],
[ 0, 0, 0, 0, 0, 0, 0, 4, 3, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 1, 3],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 4],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
搜索排列
该问题的另一种思考方式是搜索问题,例如寻求找到最小化行进"距离(即连续点之间的距离之和)的点的顺序.
Search over permutations
Another way of thinking of this problem is as a search problem, for example seeking to find the order of points that minimizes the 'travel' distance, i.e. the sum of distances between successive points.
对于原始的a
(A
),连续点之间的距离(默认为np.linalg.norm
方法)为
With the original a
(A
), the distance (with default np.linalg.norm
method) between successive points is
In [407]: np.linalg.norm(A[1:]-A[:-1],axis=1)
Out[407]:
array([ 166.02710622, 165. , 164.00304875, 164. ,
165.00303028, 162. , 162.00308639, 162. ,
164.00304875])
及其总和:
In [408]: _.sum()
Out[408]: 1474.0393203904973
以lexsort
顺序
In [410]: np.linalg.norm(A1[1:]-A1[:-1],axis=1)
Out[410]:
array([ 4.47213595, 2. , 1. , 2.23606798,
161.00310556, 1. , 4.24264069, 1. ,
4.12310563])
In [411]: _.sum()
Out[411]: 181.07705580534656
显然,这主要是基于第二列的值,它具有更好的聚类.
Clearly this has better clustering, mainly based on the 2nd column values.
您的sorted_a
对此总和有所改善:
Your sorted_a
improves this sum a bit:
In [414]: sortedA = np.array(sorted_a)
In [415]: np.linalg.norm(sortedA[1:]-sortedA[:-1],axis=1)
Out[415]:
array([ 3.16227766, 4.12310563, 3.16227766, 1. ,
162.0277754 , 1.41421356, 1.41421356, 1. ,
2.23606798])
In [416]: _.sum()
Out[416]: 179.53993144488973
一种蛮力的解决方案是尝试所有排列,然后选择一个使该和最小的排列.
A brute force solution is to try all the permutations, and pick the one that minimizes this sum.
这篇关于排序二维清单python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!