pandas -给定特定b的条件概率 [英] Pandas - Conditional Probability of a given specific b
问题描述
我有两个带有"a"和"b"列的DataFrame.在给定特定的"b"的情况下,如何找到"a"的条件概率?
I have DataFrame with two columns of "a" and "b". How can I find the conditional probability of "a" given specific "b"?
df.groupby('a').groupby('b')
不起作用.假设我在a列中有3个类别,对于每个特定的类别,我都有5个b类别.我需要做的是为a的每个班级找到b的上班总数.我尝试了Apply命令,但是我想我不知道如何正确使用它.
does not work. Lets assume I have 3 categories in column a, for each specific on I have 5 categories of b. What I need to do is to find total number of on class of b for each class of a. I tried apply command, but I think I do not know how to use it properly.
df.groupby('a').apply(lambda x: x[x['b']] == '...').count()
推荐答案
要为类a
的每个实例查找类b
的总数,您可以这样做
To find the total number of class b
for each instance of class a
you would do
df.groupby('a').b.value_counts()
例如,创建如下的DataFrame:
For example, create a DataFrame as below:
df = pd.DataFrame({'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'], 'B':['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C':np.random.randn(8), 'D':np.random.randn(8)})
A B C D
0 foo one -1.565185 -0.465763
1 bar one 2.499516 -0.941229
2 foo two -0.091160 0.689009
3 bar three 1.358780 -0.062026
4 foo two -0.800881 -0.341930
5 bar two -0.236498 0.198686
6 foo one -0.590498 0.281307
7 foo three -1.423079 0.424715
然后:
df.groupby('A')['B'].value_counts()
A
bar one 1
two 1
three 1
foo one 2
two 2
three 1
要将其转换为条件概率,您需要除以每个组的总大小.
To convert this to a conditional probability, you need to divide by the total size of each group.
您可以与另一个groupby一起使用:
You can either do it with another groupby:
df.groupby('A')['B'].value_counts() / df.groupby('A')['B'].count()
A
bar one 0.333333
two 0.333333
three 0.333333
foo one 0.400000
two 0.400000
three 0.200000
dtype: float64
或者您可以将lambda
函数应用于组:
Or you can apply a lambda
function onto the groups:
df.groupby('a').b.apply(lambda g: g.value_counts()/len(g))
这篇关于 pandas -给定特定b的条件概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!