Python：如何在不同的 pandas 数据框列之间做平均？ [英] Python: how to do average among different pandas data frame columns?

查看：121 发布时间：2018/5/30 14:17:41 python pandas group-by

本文介绍了Python：如何在不同的 pandas 数据框列之间做平均？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据集：

 将pandas导入为pd 
 df = pd.DataFrame（{'ID1 '：[0,1,0,2,2,4]，
'ID2'：[1,0,3,4,4,2]，
'Title'：['a' ，'b'，'c'，'d'，'e'，'f']，
'重量'：[3，5，1，1，5，1]}）
 
 df 
 
 ID1 ID2标题重量
 0 1 a 3 
 1 0 b 5 
 0 3 c 1 
 2 4 d 1 
 2 4 e 5 
 4 2 f 1

我要检查多少次至 ID 协作并统计总频率和加权平均值。权重是总和中的总和> 权重 $ C>。结果应该是：

  df1 
 
 ID1 ID2加权平均值。 
 1 0 2 0.25 
 0 3 1 1 
 2 4 3 0.5

我以这种方式错误地计算了 ID1 和 ID2 之间的协作。
$ b

  df.groupby（['ID1'，'ID2']）。size（）。reset_index（）

解决方案

您可以先对列 ID1 code> ID2 numpy.ndarray.sort ，然后 groupby 与 apply f ：

e体重
0 0 1 a 3
1 1 0 b 5
2 0 3 c 1
3 2 4 d 1
4 2 4 e 5
5 4 2 f 1

id1id2 = df [['ID1'，'ID2']]。
id1id2.sort（axis = 1）
print id1id2
[[0 1]
[0 1]
[0 3]
[2 4]
[2 4]
[2 4]]

df [['ID1'，'ID2']] = id1id2
打印df
ID1 ID2标题重量
0 0 1 a 3
1 0 1 b 5
2 0 3 c 1
3 2 4 d 1
4 2 4 e 5
5 2 4 f 1

def f（x）： #print len（x） #print x ['Weight']。sum（） return pd.Series（{'Total'：len（x），'Weighted Av。'：len（x）/ float（x [ 'weight']。sum（））}） print df.groupby（['ID1'，'ID2']）。apply（f）.reset_index（） ID1 ID2 Total加权平均值 0 0 1 2.0 0.250000 1 0 3 1.0 1.000000 2 2 4 3.0 0.428571

I have the following dataset:
import pandas as pd df = pd.DataFrame({'ID1': [0, 1, 0, 2, 2, 4], 'ID2': [1, 0, 3, 4, 4, 2], 'Title': ['a', 'b', 'c', 'd', 'e', 'f'], 'Weight': [3, 5, 1, 1, 5, 1]}) df ID1 ID2 Title Weight 0 1 a 3 1 0 b 5 0 3 c 1 2 4 d 1 2 4 e 5 4 2 f 1
I wan to check how many times to ID collaborate and count the total frequency and the weighted average. The weighted is the sum of the collaboration over the sum of the Weight. The results should be:
df1 ID1 ID2 Total Weighted Av. 1 0 2 0.25 0 3 1 1 2 4 3 0.5
I am counting in wrong way the collaboration between ID1 and ID2 in this way
df.groupby(['ID1','ID2']).size().reset_index()

解决方案
You can first sort columns ID1 and ID2 by numpy.ndarray.sort and then groupby with apply custom function f:
print df ID1 ID2 Title Weight 0 0 1 a 3 1 1 0 b 5 2 0 3 c 1 3 2 4 d 1 4 2 4 e 5 5 4 2 f 1 id1id2 = df[['ID1','ID2']].values id1id2.sort(axis=1) print id1id2 [[0 1] [0 1] [0 3] [2 4] [2 4] [2 4]] df[['ID1','ID2']] = id1id2 print df ID1 ID2 Title Weight 0 0 1 a 3 1 0 1 b 5 2 0 3 c 1 3 2 4 d 1 4 2 4 e 5 5 2 4 f 1

def f(x): #print len(x) #print x['Weight'].sum() return pd.Series({'Total':len(x), 'Weighted Av.': len(x) / float(x['Weight'].sum()) }) print df.groupby(['ID1','ID2']).apply(f).reset_index() ID1 ID2 Total Weighted Av. 0 0 1 2.0 0.250000 1 0 3 1.0 1.000000 2 2 4 3.0 0.428571

这篇关于Python：如何在不同的 pandas 数据框列之间做平均？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python：如何在不同的 pandas 数据框列之间做平均？ [英] Python: how to do average among different pandas data frame columns?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python：如何在不同的 pandas 数据框列之间做平均？ [英] Python: how to do average among different pandas data frame columns?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭