如何使用百分比制作大 pandas 交叉表? [英] How to make a pandas crosstab with percentages?
问题描述
给一个带有不同分类变量的数据框,如何返回一个用百分比而不是频率的交叉表?
Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?
df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
'B' : ['A', 'B', 'C'] * 8,
'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
'D' : np.random.randn(24),
'E' : np.random.randn(24)})
pd.crosstab(df.A,df.B)
B A B C
A
one 4 4 4
three 2 2 2
two 2 2 2
使用交叉表中的margins选项来计算行和列的总数,使我们足够接近,以至于可以使用aggfunc或groupby进行操作,但我微薄的大脑无法解决问题.
Using the margins option in crosstab to compute row and column totals gets us close enough to think that it should be possible using an aggfunc or groupby, but my meager brain can't think it through.
B A B C
A
one .33 .33 .33
three .33 .33 .33
two .33 .33 .33
推荐答案
pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)
基本上,您只具有执行row/row.sum()
的功能,并且将apply
与axis=1
结合使用以按行应用它.
Basically you just have the function that does row/row.sum()
, and you use apply
with axis=1
to apply it by row.
(如果在Python 2中执行此操作,则应使用from __future__ import division
来确保除法始终返回浮点数.)
(If doing this in Python 2, you should use from __future__ import division
to make sure division always returns a float.)
这篇关于如何使用百分比制作大 pandas 交叉表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!