高效的数据筛选以获取唯一值(Python) [英] Efficient data sifting for unique values (Python)

查看:331
本文介绍了高效的数据筛选以获取唯一值(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由(X,Y,Z,A)值组成的2D Numpy数组,其中(X,Y,Z)是3D空间中的笛卡尔坐标,而A是该位置的某个值。

I have a 2D Numpy array that consists of (X,Y,Z,A) values, where (X,Y,Z) are Cartesian coordinates in 3D space, and A is some value at that location. As an example..

__X__|__Y__|__Z__|__A_
  13 |  7  |  21 | 1.5
  9  |  2  |  7  | 0.5
  15 |  3  |  9  | 1.1
  13 |  7  |  21 | 0.9
  13 |  7  |  21 | 1.7
  15 |  3  |  9  | 1.1

是否有一种有效的方法来查找(X,Y)的所有唯一组合并添加他们的价值观?例如,(13,7)的总数为(1.5 + 0.9 + 1.7)或4.1。

Is there an efficient way to find all the unique combinations of (X,Y), and add their values? For example, the total for (13,7) would be (1.5+0.9+1.7), or 4.1.

推荐答案

方法#1

获取每一行作为视图,从而将每一行转换为标量,然后使用 np.unique 将每行标记为从(0 ...... n)开始的最小标量, n 为没有。基于唯一性的唯一标量,最后使用 np.bincount`根据先前获得的唯一标量对最后一列进行求和。

Get each row as a view, thus converting each into a scalar each and then use np.unique to tag each row as a minimum scalar starting from (0......n), withnas no. of unique scalars based on the uniqueness among others and finally usenp.bincount` to perform the summing of the last column based on the unique scalars obtained earlier.

这是实现-

def get_row_view(a):
    void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
    a = np.ascontiguousarray(a)
    return a.reshape(a.shape[0], -1).view(void_dt).ravel()

def groupby_cols_view(x):
    a = x[:,:2].astype(int)   
    a1D = get_row_view(a)     
    _, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)
    return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]

方法#2

与方法1相同,但不是使用视图,我们将为每一行生成等效的线性索引从而将每一行减少为一个标量。其余工作流程与第一种方法相同。

Same as approach #1, but instead of working with the view, we will generate equivalent linear index equivalent for each row and thus reducing each row to a scalar. Rest of the workflow is same as with the first approach.

实现-

def groupby_cols_linearindex(x):
    a = x[:,:2].astype(int)   
    a1D = a[:,0] + a[:,1]*(a[:,0].max() - a[:,1].min() + 1)    
    _, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)
    return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]

示例运行

In [80]: data
Out[80]: 
array([[ 2.        ,  5.        ,  1.        ,  0.40756048],
       [ 3.        ,  4.        ,  6.        ,  0.78945661],
       [ 1.        ,  3.        ,  0.        ,  0.03943097],
       [ 2.        ,  5.        ,  7.        ,  0.43663582],
       [ 4.        ,  5.        ,  0.        ,  0.14919507],
       [ 1.        ,  3.        ,  3.        ,  0.03680583],
       [ 1.        ,  4.        ,  8.        ,  0.36504428],
       [ 3.        ,  4.        ,  2.        ,  0.8598825 ]])

In [81]: groupby_cols_view(data)
Out[81]: 
array([[ 1.        ,  3.        ,  0.0762368 ],
       [ 1.        ,  4.        ,  0.36504428],
       [ 2.        ,  5.        ,  0.8441963 ],
       [ 3.        ,  4.        ,  1.64933911],
       [ 4.        ,  5.        ,  0.14919507]])

In [82]: groupby_cols_linearindex(data)
Out[82]: 
array([[ 1.        ,  3.        ,  0.0762368 ],
       [ 1.        ,  4.        ,  0.36504428],
       [ 3.        ,  4.        ,  1.64933911],
       [ 2.        ,  5.        ,  0.8441963 ],
       [ 4.        ,  5.        ,  0.14919507]])

这篇关于高效的数据筛选以获取唯一值(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆