pandas 计入多列 [英] pandas count over multiple columns
本文介绍了 pandas 计入多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个像这样的数据框
I have a dataframe looking like this
Measure1 Measure2 Measure3 ...
0 1 3
1 3 2
3 0
我想计算要产生的列上值的出现次数:
I'd like to count the occurrences of the values over the columns to produce:
Measure Count Percentage
0 2 0.25
1 2 0.25
2 1 0.125
3 3 0.373
使用
outcome_measure_count = cdss_data.groupby(key_columns=['Measure1'],operations={'count': agg.COUNT()}).sort('count', ascending=True)
我只得到第一列(实际上是使用graphlab程序包,但我更喜欢熊猫)
I only get the first column (actually using graphlab package, but I'd prefer pandas)
有人可以帮我吗?
推荐答案
您可以通过使用ravel
和value_counts
展平df来生成计数,从而可以构建最终的df:
You can generate the counts by flattening the df using ravel
and value_counts
, from this you can construct the final df:
In [230]:
import io
import pandas as pd
t="""Measure1 Measure2 Measure3
0 1 3
1 3 2
3 0 0"""
df = pd.read_csv(io.StringIO(t), sep='\s+')
df
Out[230]:
Measure1 Measure2 Measure3
0 0 1 3
1 1 3 2
2 3 0 0
In [240]:
count = pd.Series(df.squeeze().values.ravel()).value_counts()
pd.DataFrame({'Measure': count.index, 'Count':count.values, 'Percentage':(count/count.sum()).values})
Out[240]:
Count Measure Percentage
0 3 3 0.333333
1 3 0 0.333333
2 2 1 0.222222
3 1 2 0.111111
我插入了0
只是为了使df形状正确,但是您应该明白了这一点
I inserted a 0
just to make the df shape correct but you should get the point
这篇关于 pandas 计入多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文