在 pandas 数据框中的单个列中汇总一系列单元格 [英] Sum a range of cells in a single column in pandas dataframe
问题描述
我在 DataFrame 中有三列.我想取 Streak_Count 列中的数字,并从 MON TOTAL 中的回报中总结该单元格数.结果显示在 WANTED RESULT 中,如下所示.我无法弄清楚的问题是在此示例中将可以是 1 到 4 之间的任意数字的单元格数量相加.
MON TOTAL STREAK_COUNT 个想要的结果1/2/1992 1.123077 1 1.123077 (只有 1 所以 1.12)2/3/1992 -1.296718 03/2/1992 -6.355612 2 -7.65233(-1.29 和 -6.35 之和)4/1/1992 5.634692 05/1/1992 4.180605 2 9.815297(5.63 和 4.18 之和)7/1/1992 -0.101016 08/3/1992 -0.706125 2 -0.807141(-.10 和 -.706 之和)10/1/1992 0.368579 011/2/1992 3.822277 01/4/1993 2.233359 02/1/1993 15.219644 4 21.6438593/1/1993 -2.647693 1 -2.6476934/1/1993 1.599094 1 1.599094
一切都是为了找到合适的分组依据.在这种情况下,STREAK_COUNT
的反向累积总和将为您提供您想要的.
首先我们创建数据框:
将pandas导入为pd>>>df = pd.DataFrame({'MON TOTAL':[1.123077, -1.296178, -6.355612, 5.634692, 4.180605, -0.101016, -0.706125,0.368579, 3.822277, 2.233359, 15.219644, -2.647693, 1.599094],'STREAK_COUNT':[1, 0, 2, 0, 2, 0, 2, 0, 0, 0, 4, 1, 1]},索引=['1/2/1992'、'2/3/1992'、'3/2/1992'、'4/1/1992'、'5/1/1992'、'7/1/1992', '8/3/1992',10/1/1992"、11/2/1992"、1/4/1993"、2/1/1993"、3/1/1993"、4/1/1993"])>>>dfMON TOTAL STREAK_COUNT1/2/1992 1.123077 12/3/1992 -1.296178 03/2/1992 -6.355612 24/1/1992 5.634692 05/1/1992 4.180605 27/1/1992 -0.101016 08/3/1992 -0.706125 210/1/1992 0.368579 011/2/1992 3.822277 01/4/1993 2.233359 02/1/1993 15.219644 43/1/1993 -2.647693 14/1/1993 1.599094 1
接下来找到组,计算每个组的总和,并将结果加入原始数据帧:
<预><代码>>>>组 = df['STREAK_COUNT'][::-1].cumsum()[::-1]>>>df['RESULT'] = df.groupby(groups)['MON TOTAL'].transform('sum')>>>dfMON TOTAL STREAK_COUNT 个结果1/2/1992 1.123077 1 1.1230772/3/1992 -1.296178 0 -7.6517903/2/1992 -6.355612 2 -7.6517904/1/1992 5.634692 0 9.8152975/1/1992 4.180605 2 9.8152977/1/1992 -0.101016 0 -0.8071418/3/1992 -0.706125 2 -0.80714110/1/1992 0.368579 0 21.64385911/2/1992 3.822277 0 21.6438591/4/1993 2.233359 0 21.6438592/1/1993 15.219644 4 21.6438593/1/1993 -2.647693 1 -2.6476934/1/1993 1.599094 1 1.599094如果您只想要每个连胜结束的结果,则使用掩码对其进行过滤:
<预><代码>>>>df[df['STREAK_COUNT'] >0]MON TOTAL STREAK_COUNT 个结果1/2/1992 1.123077 1 1.1230773/2/1992 -6.355612 2 -7.6517905/1/1992 4.180605 2 9.8152978/3/1992 -0.706125 2 -0.8071412/1/1993 15.219644 4 21.6438593/1/1993 -2.647693 1 -2.6476934/1/1993 1.599094 1 1.599094I have three columns in a DataFrame. I want to take the number in the Streak_Count column and sum up that number of cells from the returns in the MON TOTAL. The result is displayed in the WANTED RESULT as shown below. The issue I cant figure out is summing the number of cells which can be any number>> in this example between 1 and 4.
MON TOTAL STREAK_COUNT WANTED RESULT
1/2/1992 1.123077 1 1.123077 (only 1 so 1.12)
2/3/1992 -1.296718 0
3/2/1992 -6.355612 2 -7.65233 (sum of -1.29 and -6.35)
4/1/1992 5.634692 0
5/1/1992 4.180605 2 9.815297 (sum of 5.63 and 4.18)
7/1/1992 -0.101016 0
8/3/1992 -0.706125 2 -0.807141 (sum of -.10 and -.706)
10/1/1992 0.368579 0
11/2/1992 3.822277 0
1/4/1993 2.233359 0
2/1/1993 15.219644 4 21.643859
3/1/1993 -2.647693 1 -2.647693
4/1/1993 1.599094 1 1.599094
It's all about finding the right thing to group by. In this case, a reversed cumulative sum of STREAK_COUNT
will give you what you want.
First we create the dataframe:
import pandas as pd
>>> df = pd.DataFrame({'MON TOTAL':[1.123077, -1.296178, -6.355612, 5.634692, 4.180605, -0.101016, -0.706125,
0.368579, 3.822277, 2.233359, 15.219644, -2.647693, 1.599094],
'STREAK_COUNT':[1, 0, 2, 0, 2, 0, 2, 0, 0, 0, 4, 1, 1]},
index=['1/2/1992', '2/3/1992', '3/2/1992', '4/1/1992', '5/1/1992', '7/1/1992', '8/3/1992',
'10/1/1992', '11/2/1992', '1/4/1993', '2/1/1993', '3/1/1993', '4/1/1993'])
>>> df
MON TOTAL STREAK_COUNT
1/2/1992 1.123077 1
2/3/1992 -1.296178 0
3/2/1992 -6.355612 2
4/1/1992 5.634692 0
5/1/1992 4.180605 2
7/1/1992 -0.101016 0
8/3/1992 -0.706125 2
10/1/1992 0.368579 0
11/2/1992 3.822277 0
1/4/1993 2.233359 0
2/1/1993 15.219644 4
3/1/1993 -2.647693 1
4/1/1993 1.599094 1
Next find the groups, compute the sum of each group, and join the results to the original dataframe:
>>> groups = df['STREAK_COUNT'][::-1].cumsum()[::-1]
>>> df['RESULT'] = df.groupby(groups)['MON TOTAL'].transform('sum')
>>> df
MON TOTAL STREAK_COUNT RESULT
1/2/1992 1.123077 1 1.123077
2/3/1992 -1.296178 0 -7.651790
3/2/1992 -6.355612 2 -7.651790
4/1/1992 5.634692 0 9.815297
5/1/1992 4.180605 2 9.815297
7/1/1992 -0.101016 0 -0.807141
8/3/1992 -0.706125 2 -0.807141
10/1/1992 0.368579 0 21.643859
11/2/1992 3.822277 0 21.643859
1/4/1993 2.233359 0 21.643859
2/1/1993 15.219644 4 21.643859
3/1/1993 -2.647693 1 -2.647693
4/1/1993 1.599094 1 1.599094
If you just want results for the end of each streak, then use a mask to filter it:
>>> df[df['STREAK_COUNT'] > 0]
MON TOTAL STREAK_COUNT RESULT
1/2/1992 1.123077 1 1.123077
3/2/1992 -6.355612 2 -7.651790
5/1/1992 4.180605 2 9.815297
8/3/1992 -0.706125 2 -0.807141
2/1/1993 15.219644 4 21.643859
3/1/1993 -2.647693 1 -2.647693
4/1/1993 1.599094 1 1.599094
这篇关于在 pandas 数据框中的单个列中汇总一系列单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!