对于一个数组中所有唯一值,计算两个数组中相同值的数量 [英] Count of the number of identical values in two arrays for all the unique values in an array

查看:120
本文介绍了对于一个数组中所有唯一值,计算两个数组中相同值的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数组A和B.A有多个值(这些值可以是字符串,整数或浮点数),B的值是0和1.对于A中的每个唯一值,我需要重合的点数B中的1和B中的0.两个计数都需要存储为单独的变量. 例如:

I have two arrays A and B. A has multiple values (these values can be string or integer or float) and B has values 0 and 1. I need, for each unique value in A, the count of points that coincide with the 1s in B and the 0s in B. Both the counts need to be stored as separate variables. For example:

A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # input multivalue array; it has three unique values – 1,2,3
B = [0, 0, 0, 1, 1, 1, 0, 1, 0] # input binary array
#Desired result: 
countA1_B1 = 1 #for unique value of '1' in A the count of places where there is '1' in B
countA1_B0 = 3 #for unique value of '1' in A the count of places where there is '0' in B
countAno1_B1 = 3 #for unique value of '1' in A the count of places where there is no '1' in A but there is '1' in B 
countAno1_B0 = 2 #for unique value of '1' in A the count of places where there is no '1' in A and there is '0' in B 

我需要A中的所有唯一值.A数组/列表将是一个栅格,因此将不知道唯一值.因此,代码将首先提取A中的唯一值,然后执行剩余的计算 我解决这个问题的方法(请参见

I need this for all the unique values in A. The A array/list would be a raster and hence the unique values will not be known. So the code would first extract the unique values in A and then do the remaining calculations My approach to solving this (see post previous question:)

Import numpy as np
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # input array
B = [0, 0, 0, 1, 1, 1, 0, 1, 0] # input binary array
A_arr = np.array(A)
A_unq = np.unique(A_arr)
#code 1    
A_masked_arrays = np.array((A_arr[None, :] == A_unq[:, None]).astype(int)) 
#code 2
# A_masked_arrays = [(A==unique_val).astype(int) for unique_val in
np.unique(A)]
print(A_masked_arrays) 
out = {val: arr for val, arr in zip(list(A_unq), list(A_arr))} 
#zip() throws error
#TypeError: 'zip' object is not callable. 
dict = {}
for i in A_unq:
    for j in A_masked_arrays:
        dict = i, j
        print(dict)

获得的结果

# from code 1
[[1 1 0 0 0 1 1 0 0]
 [0 0 0 1 1 0 0 0 0]
 [0 0 1 0 0 0 0 1 1]]
# from code 2
[array([1, 1, 0, 0, 0, 1, 1, 0, 0]), array([0, 0, 0, 1, 1, 0, 0, 0, 0]), 
array([0, 0, 1, 0, 0, 0, 0, 1, 1])]

使用字典创建,我得到这个结果

Using dictionary creation I get this result

(1, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(1, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(1, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
(2, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(2, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(2, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
(3, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(3, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(3, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))

这就是我被困住的地方.从这里如何获得A中每个唯一值的最终计数,如countA1_B1,countA1_B0,countAno1_B1,countAno1_B0等.需要帮助.预先感谢.

This is where I am stuck up. From here how to get to the final count of each unique value in A as countA1_B1, countA1_B0, countAno1_B1, countAno1_B0 and so on. Need help with this. Thanks in advance.

推荐答案

使用熊猫进行这种分组操作要容易得多:

It's much easier to use pandas to do this kind of groupby operation:

In [11]: import pandas as pd

In [12]: df = pd.DataFrame({"A": A, "B": B})

In [13]: df
Out[13]:
   A  B
0  1  0
1  1  0
2  3  0
3  2  1
4  2  1
5  1  1
6  1  0
7  3  1
8  3  0

现在您可以使用groupby:

Now you can use groupby:

In [14]: gb = df.groupby("A")["B"]

In [15]: gb.count()  # number of As
Out[15]:
A
1    4
2    2
3    3
Name: B, dtype: int64

In [16]: gb.sum()  # number of As where B == 1
Out[16]:
A
1    1
2    2
3    1
Name: B, dtype: int64

In [17]: gb.count() - gb.sum()  # number of As where B == 0
Out[17]:
A
1    3
2    0
3    2
Name: B, dtype: int64

您还可以通过Apply来更明确,更广泛地执行此操作(例如,如果它不仅是0和1),

You can also do this more explicitly and more generally (e.g. if it's not just 0 and 1) with an apply:

In [18]: gb.apply(lambda x: (x == 1).sum())
Out[18]:
A
1    1
2    2
3    1
Name: B, dtype: int64

这篇关于对于一个数组中所有唯一值,计算两个数组中相同值的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆