检查两列之间的一对一关系 [英] Check one-on-one relationship between two columns
问题描述
我在熊猫数据框中有两列A和B,其中值重复多次.对于A中的唯一值,预计B也将具有另一个"唯一值. A的每个唯一值在B中都有一个对应的唯一值(请参见下面的示例,以两个列表的形式).但是由于每列中的每个值都重复多次,所以我想检查两列之间是否存在一对一的关系.大熊猫中是否有任何内置功能可以检查?如果没有,是否有一种有效的方法来完成该任务?
I have two columns A and B in a pandas dataframe, where values are repeated multiple times. For a unique value in A, B is expected to have "another" unique value too. And each unique value of A has a corresponding unique value in B (See example below in the form of two lists). But since each value in each column is repeated multiple times, I would like to check if any one-to-one relationship exists between two columns or not. Is there any inbuilt function in pandas to check that? If not, is there an efficient way of achieving that task?
示例:
A = [1, 3, 3, 2, 1, 2, 1, 1]
B = [5, 12, 12, 10, 5, 10, 5, 5]
在这里,对于A中的每个1,B中的对应值始终为5,除此之外没有其他值.类似地,对于2-> 10,对于3-> 12.因此,A中的每个数字在B中只有一个/唯一的对应数字(而没有其他数字).我称这种一对一的关系.现在,我要检查pandas数据框中的两列之间是否存在这种关系.
Here, for each 1 in A, the corresponding value in B is always 5, and nothing else. Similarly, for 2-->10, and for 3-->12. Hence, each number in A has only one/unique corresponding number in B (and no other number). I have called this one-on-one relationship. Now I want to check if such relationship exists between two columns in pandas dataframe or not.
不满足此关系的示例:
A = [1, 3, 3, 2, 1, 2, 1, 1]
B = [5, 12, 12, 10, 5, 10, 7, 5]
在这里,A中的1在B中没有唯一的对应值.它具有两个对应的值-5和7.因此,不满足该关系.
Here, 1 in A doesn't have a unique corresponding value in B. It has two corresponding values - 5 and 7. Hence, the relationship is not satisfied.
推荐答案
考虑到您有一些数据框:
Consider you have some dataframe:
d = df({'A': [1, 3, 1, 2, 1, 3, 2], 'B': [4, 6, 4, 5, 4, 6, 5]})
d
具有groupby
方法,该方法返回 GroupBy
对象.例如,这是用于按相等的列值对一些行进行分组的接口.
d
has groupby
method, which returns GroupBy
object. This is the interface to group some rows by equal column value, for example.
gb = d.groupby('A')
grouped_b_column = gb['B']
在分组的行上,您可以执行聚合.让我们在每个组中找到最小值和最大值.
On grouped rows you could perform an aggregation. Lets find min and max value in every group.
res = grouped_b_column.agg([np.min, np.max])
>>> print(res)
amin amax
A
1 4 4
2 5 5
3 6 6
现在我们只需要检查每个组中的amin
和amax
是否相等,所以每个组都由相等的B
字段组成:
Now we just should check that amin
and amax
are equal in every group, so every group consists of equal B
fields:
res['amin'].equals(res['amax'])
如果此检查正常,则对于每个A
,您都有唯一的B
.现在,您应该检查交换A
和B
列的相同条件.
If this check is OK, then for every A
you have unique B
. Now you should check the same criteria for A
and B
columns swapped.
这篇关于检查两列之间的一对一关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!