numpy排序奇怪的行为 [英] numpy sort wierd behavior

查看:77
本文介绍了numpy排序奇怪的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找我之前提出的问题的答案. 保留订单的numpy.unique 效果很好,但举一个例子,我遇到了问题

I'm looking at the answers to an earlier question I asked. numpy.unique with order preserved They work great, but with one example, I have problems.

b
['Aug-09' 'Aug-09' 'Aug-09' ..., 'Jan-13' 'Jan-13' 'Jan-13']
b.shape
(83761,)
b.dtype
|S6
bi, idxb = np.unique(b, return_index=True)
months = bi[np.argsort(idxb)]
months
ndarray: ['Feb-10' 'Aug-10' 'Nov-10' 'Oct-12' 'Oct-11' 'Jul-10' 'Feb-12' 'Sep-11'\n  'Jan-10' 'Apr-10' 'May-10' 'Sep-09' 'Mar-11' 'Jun-12' 'Jul-12' 'Dec-09'\n 'Aug-09' 'Nov-12' 'Dec-12' 'Apr-12' 'Jun-11' 'Jan-11' 'Jul-11' 'Sep-10'\n 'Jan-12' 'Dec-10' 'Oct-09' 'Nov-11' 'Oct-10' 'Mar-12' 'Jan-13' 'Nov-09'\n 'May-11' 'Mar-10' 'Jun-10' 'Dec-11' 'May-12' 'Feb-11' 'Aug-11' 'Sep-12'\n 'Apr-11' 'Aug-12']

为什么几个月从2月10日开始而不是9月9日开始?使用较小的数据集,我得到了预期的行为,即,几个月从09年8月开始.我会在10年2月得到关于上一个问题的所有答案.

Why does months start with Feb-10 instead of Aug-09? With smaller datasets I get the expected behavior, i.e. months starts with Aug-09. I get Feb-10 with all answers to the previous question.

这有效

months = []
for bi in b:
    if bi not in months:
        months.append(bi) 


http://www.uploadmb.com/dw.php?id=1364341573 这是我的数据集.自己尝试.


http://www.uploadmb.com/dw.php?id=1364341573 Here is my dataset. Try it yourself.

import numpy as np
f=open('test.txt','r')
res = []
for line in f.readlines():
   res.append(line.strip())

a = np.array(res)
_, idx = np.unique(a, return_index=True)
print a[np.sort(idx)]

推荐答案

更新:

我相信问题实际上是此.您正在运行什么版本的Numpy?

I believe the problem is actually this. What version of Numpy are you running?

http://projects.scipy.org/numpy/ticket/2063

我重现了您的问题,因为在我测试过的Ubuntu上安装的Numpy安装为1.6.1,并且该错误已修复为1.6.2及更高版本.

I reproduced your problem because the Ubuntu installation of Numpy I tested on was 1.6.1, and the bug was fixed at 1.6.2 and above.

升级Numpy,然后重试,它在我的Ubuntu机器上对我有用.

Upgrade Numpy, and try again, it worked for me on my Ubuntu machine.

在这些行中:

bi, idxb = np.unique(b, return_index=True)
months = bi[np.argsort(idxb)]

有两个错误:

  1. 您想在原始数组b[...]
  2. 上实际使用排序后的索引
  3. 您需要排序的索引,而不是对索引进行排序的索引,因此请使用sort而不是argsort.
  1. You want to actually use the sorted indices on the original array, b[...]
  2. You want the sorted indices, not the indices that sort the indices, so use sort not argsort.

这应该有效:

bi, idxb = np.unique(b, return_index=True)
months = b[np.sort(idxb)]


是的,使用您的数据集并在Mac OS 10.6(64位)上运行python 2.7,numpy 1.7,确实如此


Yes, it does, using your data set and running python 2.7, numpy 1.7 on Mac OS 10.6, 64 bit

Python 2.7.3 (default, Oct 23 2012, 13:06:50) 

IPython 0.13.1 -- An enhanced Interactive Python.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.7.0'

In [3]: from platform import architecture

In [4]: architecture()
Out[4]: ('64bit', '')

In [5]: f = open('test.txt','r')

In [6]: lines = np.array([line.strip() for line in f.readlines()])

In [7]: _, ilines = np.unique(lines, return_index = True)

In [8]: months = lines[np.sort(ilines)]

In [9]: months
Out[9]: 
array(['Aug-09', 'Sep-09', 'Oct-09', 'Nov-09', 'Dec-09', 'Jan-10',
       'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10', 'Jul-10',
       'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10', 'Jan-11',
       'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11', 'Jul-11',
       'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11', 'Jan-12',
       'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12', 'Jul-12',
       'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12', 'Jan-13'], 
      dtype='|S6')


好的,我最终也可以在Ubuntu 64位上重现您的问题:


OK, I can finally reproduce your problem on Ubuntu 64 bit too:

Python 2.7.3 (default, Aug  1 2012, 05:14:39) 

IPython 0.12.1 -- An enhanced Interactive Python.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.6.1'

In [3]: from platform import architecture

In [4]: architecture()
Out[4]: ('64bit', 'ELF')

In [5]: f = open('test.txt','r')

In [6]: lines = np.array([line.strip() for line in f.readlines()])

In [7]: _, ilines = np.unique(lines, return_index=True)

In [8]: months = lines[np.sort(ilines)]

In [9]: months
Out[9]: 
array(['Feb-10', 'Aug-10', 'Nov-10', 'Oct-12', 'Oct-11', 'Jul-10',
       'Feb-12', 'Sep-11', 'Jan-10', 'Apr-10', 'May-10', 'Sep-09',
       'Mar-11', 'Jun-12', 'Jul-12', 'Dec-09', 'Aug-09', 'Nov-12',
       'Dec-12', 'Apr-12', 'Jun-11', 'Jan-11', 'Jul-11', 'Sep-10',
       'Jan-12', 'Dec-10', 'Oct-09', 'Nov-11', 'Oct-10', 'Mar-12',
       'Jan-13', 'Nov-09', 'May-11', 'Mar-10', 'Jun-10', 'Dec-11',
       'May-12', 'Feb-11', 'Aug-11', 'Sep-12', 'Apr-11', 'Aug-12'], 
      dtype='|S6')


Numpy升级后可在Ubuntu上运行:


Works on Ubuntu after Numpy upgrade:

Python 2.7.3 (default, Aug  1 2012, 05:14:39) 

IPython 0.12.1 -- An enhanced Interactive Python.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.7.0'

In [3]: f = open('test.txt','r')

In [4]: lines = np.array([line.strip() for line in f.readlines()])

In [5]: _, ilines = np.unique(lines, return_index=True)

In [6]: months = lines[np.sort(ilines)]

In [7]: months
Out[7]: 
array(['Aug-09', 'Sep-09', 'Oct-09', 'Nov-09', 'Dec-09', 'Jan-10',
       'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10', 'Jul-10',
       'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10', 'Jan-11',
       'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11', 'Jul-11',
       'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11', 'Jan-12',
       'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12', 'Jul-12',
       'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12', 'Jan-13'], 
      dtype='|S6')

这篇关于numpy排序奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆