根据numpy数组中一行的重复值删除列 [英] Delete columns based on repeat value in one row in numpy array

查看:82
本文介绍了根据numpy数组中一行的重复值删除列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望删除阵列中在第1行中具有重复条目的列,如下所示(行1的重复值是1和2.5,因此这些值中​​的每一个都被删除了,

I'm hoping to delete columns in my arrays that have repeat entries in row 1 as shown below (row 1 has repeats of values 1 & 2.5, so one of each of those values have been been deleted, together with the column each deleted value lies within).

initial_array =

row 0   [[  1,    1,    1,    1,    1,    1,    1,    1,]
row 1    [0.5,    1,  2.5,    4,  2.5,    2,    1,  3.5,]
row 2    [  1,  1.5,    3,  4.5,    3,  2.5,  1.5,    4,]
row 3    [228,  314,  173,  452,  168,  351,  300,  396]]

final_array =
row 0   [[  1,    1,    1,    1,    1,    1,]
row 1    [0.5,    1,  2.5,    4,    2,  3.5,]
row 2    [  1,  1.5,    3,  4.5,  2.5,    4,]
row 3    [228,  314,  173,  452,  351,  396]]

我正在考虑使用一些检查重复的函数,第二次(或更多次)给出True响应,然后在数据集中出现一个值,然后使用该响应se删除行。那或可能使用numpy.unique中的返回索引函数。我只是无法完全找到通过它的方法或找到合适的函数。

Ways I was thinking of included using some function that checked for repeats, giving a True response for the second (or more) time a value turned up in the dataset, then using that response to delete the row. That or possibly using the return indices function within numpy.unique. I just can't quite find a way through it or find the right function though.

如果我可以找到一种方法来返回平均值的第3行保留重复并删除一个重复,这样会更好(请参见下文)。

If I could find a way to return an mean value in the row 3 of the retained repeat and the deleted one, that would be even better (see below).

final_array_averaged =
row 0   [[  1,    1,      1,    1,    1,    1,]
row 1    [0.5,    1,    2.5,    4,    2,  3.5,]
row 2    [  1,  1.5,      3,  4.5,  2.5,    4,]
row 3    [228,  307,  170.5,  452,  351,  396]]

在此先感谢您为陷入困境的初学者提供的任何帮助!

Thanks in advance for any help you can give to a beginner who is stumped!

推荐答案

您可以使用可选参数 np.unique 附带的内容,然后使用 np.bincount 将最后一行用作权重以获取最终平均值输出,像这样-

You can use the optional arguments that come with np.unique and then use np.bincount to use the last row as weights to get the final averaged output, like so -

_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C

示例运行-

In [212]: arr
Out[212]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2.5,    4. ,    2.5,    2. ,    1. ,    3.5],
       [   1. ,    1.5,    3. ,    4.5,    3. ,    2.5,    1.5,    4. ],
       [ 228. ,  314. ,  173. ,  452. ,  168. ,  351. ,  300. ,  396. ]])

In [213]: out
Out[213]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2. ,    2.5,    3.5,    4. ],
       [   1. ,    1.5,    2.5,    3. ,    4. ,    4.5],
       [ 228. ,  307. ,  351. ,  170.5,  396. ,  452. ]])

现在可以看到输出中有一个订单第二行被排序。如果您希望保留原来的顺序,请使用 unqID 中的 np.argsort ,如下所示-

As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use np.argsort of unqID, like so -

In [221]: out[:,unqID.argsort()]
Out[221]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2.5,    4. ,    2. ,    3.5],
       [   1. ,    1.5,    3. ,    4.5,    2.5,    4. ],
       [ 228. ,  307. ,  170.5,  452. ,  351. ,  396. ]])

这篇关于根据numpy数组中一行的重复值删除列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆