高效的数据筛选以获取唯一值(Python) [英] Efficient data sifting for unique values (Python)
问题描述
我有一个由(X,Y,Z,A)值组成的2D Numpy数组,其中(X,Y,Z)是3D空间中的笛卡尔坐标,而A是该位置的某个值。
I have a 2D Numpy array that consists of (X,Y,Z,A) values, where (X,Y,Z) are Cartesian coordinates in 3D space, and A is some value at that location. As an example..
__X__|__Y__|__Z__|__A_
13 | 7 | 21 | 1.5
9 | 2 | 7 | 0.5
15 | 3 | 9 | 1.1
13 | 7 | 21 | 0.9
13 | 7 | 21 | 1.7
15 | 3 | 9 | 1.1
是否有一种有效的方法来查找(X,Y)的所有唯一组合并添加他们的价值观?例如,(13,7)的总数为(1.5 + 0.9 + 1.7)或4.1。
Is there an efficient way to find all the unique combinations of (X,Y), and add their values? For example, the total for (13,7) would be (1.5+0.9+1.7), or 4.1.
推荐答案
方法#1
获取每一行作为视图,从而将每一行转换为标量,然后使用 np.unique
将每行标记为从(0 ...... n)开始的最小标量,
n 为没有。基于唯一性的唯一标量,最后使用
np.bincount`根据先前获得的唯一标量对最后一列进行求和。
Get each row as a view, thus converting each into a scalar each and then use np.unique
to tag each row as a minimum scalar starting from (0......n), with
nas no. of unique scalars based on the uniqueness among others and finally use
np.bincount` to perform the summing of the last column based on the unique scalars obtained earlier.
这是实现-
def get_row_view(a):
void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
a = np.ascontiguousarray(a)
return a.reshape(a.shape[0], -1).view(void_dt).ravel()
def groupby_cols_view(x):
a = x[:,:2].astype(int)
a1D = get_row_view(a)
_, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)
return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]
方法#2
与方法1相同,但不是使用视图
,我们将为每一行生成等效的线性索引从而将每一行减少为一个标量。其余工作流程与第一种方法相同。
Same as approach #1, but instead of working with the view
, we will generate equivalent linear index equivalent for each row and thus reducing each row to a scalar. Rest of the workflow is same as with the first approach.
实现-
def groupby_cols_linearindex(x):
a = x[:,:2].astype(int)
a1D = a[:,0] + a[:,1]*(a[:,0].max() - a[:,1].min() + 1)
_, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)
return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]
示例运行
In [80]: data
Out[80]:
array([[ 2. , 5. , 1. , 0.40756048],
[ 3. , 4. , 6. , 0.78945661],
[ 1. , 3. , 0. , 0.03943097],
[ 2. , 5. , 7. , 0.43663582],
[ 4. , 5. , 0. , 0.14919507],
[ 1. , 3. , 3. , 0.03680583],
[ 1. , 4. , 8. , 0.36504428],
[ 3. , 4. , 2. , 0.8598825 ]])
In [81]: groupby_cols_view(data)
Out[81]:
array([[ 1. , 3. , 0.0762368 ],
[ 1. , 4. , 0.36504428],
[ 2. , 5. , 0.8441963 ],
[ 3. , 4. , 1.64933911],
[ 4. , 5. , 0.14919507]])
In [82]: groupby_cols_linearindex(data)
Out[82]:
array([[ 1. , 3. , 0.0762368 ],
[ 1. , 4. , 0.36504428],
[ 3. , 4. , 1.64933911],
[ 2. , 5. , 0.8441963 ],
[ 4. , 5. , 0.14919507]])
这篇关于高效的数据筛选以获取唯一值(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!