高效的数据筛选以获取唯一值（Python） [英] Efficient data sifting for unique values (Python)

查看：331 发布时间：2020/9/25 0:18:45 arrays python-2.7 numpy unique

本文介绍了高效的数据筛选以获取唯一值（Python）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个由（X，Y，Z，A）值组成的2D Numpy数组，其中（X，Y，Z）是3D空间中的笛卡尔坐标，而A是该位置的某个值。

I have a 2D Numpy array that consists of (X,Y,Z,A) values, where (X,Y,Z) are Cartesian coordinates in 3D space, and A is some value at that location. As an example..

__X__|__Y__|__Z__|__A_
  13 |  7  |  21 | 1.5
  9  |  2  |  7  | 0.5
  15 |  3  |  9  | 1.1
  13 |  7  |  21 | 0.9
  13 |  7  |  21 | 1.7
  15 |  3  |  9  | 1.1

是否有一种有效的方法来查找（X，Y）的所有唯一组合并添加他们的价值观？例如，（13,7）的总数为（1.5 + 0.9 + 1.7）或4.1。

Is there an efficient way to find all the unique combinations of (X,Y), and add their values? For example, the total for (13,7) would be (1.5+0.9+1.7), or 4.1.

推荐答案

方法＃1

获取每一行作为视图，从而将每一行转换为标量，然后使用 np.unique 将每行标记为从（0 ...... n）开始的最小标量， n 为没有。基于唯一性的唯一标量，最后使用 np.bincount`根据先前获得的唯一标量对最后一列进行求和。

Get each row as a view, thus converting each into a scalar each and then use np.unique to tag each row as a minimum scalar starting from (0......n), withnas no. of unique scalars based on the uniqueness among others and finally usenp.bincount` to perform the summing of the last column based on the unique scalars obtained earlier.

这是实现-

def get_row_view(a):
    void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
    a = np.ascontiguousarray(a)
    return a.reshape(a.shape[0], -1).view(void_dt).ravel()

def groupby_cols_view(x):
    a = x[:,:2].astype(int)   
    a1D = get_row_view(a)     
    _, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)
    return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]

方法＃2

与方法1相同，但不是使用视图，我们将为每一行生成等效的线性索引从而将每一行减少为一个标量。其余工作流程与第一种方法相同。

Same as approach #1, but instead of working with the view, we will generate equivalent linear index equivalent for each row and thus reducing each row to a scalar. Rest of the workflow is same as with the first approach.

实现-

def groupby_cols_linearindex(x):
    a = x[:,:2].astype(int)   
    a1D = a[:,0] + a[:,1]*(a[:,0].max() - a[:,1].min() + 1)    
    _, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)
    return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]

示例运行

In [80]: data
Out[80]: 
array([[ 2.        ,  5.        ,  1.        ,  0.40756048],
       [ 3.        ,  4.        ,  6.        ,  0.78945661],
       [ 1.        ,  3.        ,  0.        ,  0.03943097],
       [ 2.        ,  5.        ,  7.        ,  0.43663582],
       [ 4.        ,  5.        ,  0.        ,  0.14919507],
       [ 1.        ,  3.        ,  3.        ,  0.03680583],
       [ 1.        ,  4.        ,  8.        ,  0.36504428],
       [ 3.        ,  4.        ,  2.        ,  0.8598825 ]])

In [81]: groupby_cols_view(data)
Out[81]: 
array([[ 1.        ,  3.        ,  0.0762368 ],
       [ 1.        ,  4.        ,  0.36504428],
       [ 2.        ,  5.        ,  0.8441963 ],
       [ 3.        ,  4.        ,  1.64933911],
       [ 4.        ,  5.        ,  0.14919507]])

In [82]: groupby_cols_linearindex(data)
Out[82]: 
array([[ 1.        ,  3.        ,  0.0762368 ],
       [ 1.        ,  4.        ,  0.36504428],
       [ 3.        ,  4.        ,  1.64933911],
       [ 2.        ,  5.        ,  0.8441963 ],
       [ 4.        ,  5.        ,  0.14919507]])

这篇关于高效的数据筛选以获取唯一值（Python）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

高效的数据筛选以获取唯一值（Python） [英] Efficient data sifting for unique values (Python)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

高效的数据筛选以获取唯一值（Python） [英] Efficient data sifting for unique values (Python)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭