在 pandas 数据框中的单个列中汇总一系列单元格 [英] Sum a range of cells in a single column in pandas dataframe

查看:65
本文介绍了在 pandas 数据框中的单个列中汇总一系列单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 DataFrame 中有三列.我想取 Streak_Count 列中的数字,并从 MON TOTAL 中的回报中总结该单元格数.结果显示在 WANTED RESULT 中,如下所示.我无法弄清楚的问题是在此示例中将可以是 1 到 4 之间的任意数字的单元格数量相加.

 MON TOTAL STREAK_COUNT 个想要的结果1/2/1992 1.123077 1 1.123077 (只有 1 所以 1.12)2/3/1992 -1.296718 03/2/1992 -6.355612 2 -7.65233(-1.29 和 -6.35 之和)4/1/1992 5.634692 05/1/1992 4.180605 2 9.815297(5.63 和 4.18 之和)7/1/1992 -0.101016 08/3/1992 -0.706125 2 -0.807141(-.10 和 -.706 之和)10/1/1992 0.368579 011/2/1992 3.822277 01/4/1993 2.233359 02/1/1993 15.219644 4 21.6438593/1/1993 -2.647693 1 -2.6476934/1/1993 1.599094 1 1.599094

解决方案

一切都是为了找到合适的分组依据.在这种情况下,STREAK_COUNT 的反向累积总和将为您提供您想要的.

首先我们创建数据框:

将pandas导入为pd>>>df = pd.DataFrame({'MON TOTAL':[1.123077, -1.296178, -6.355612, 5.634692, 4.180605, -0.101016, -0.706125,0.368579, 3.822277, 2.233359, 15.219644, -2.647693, 1.599094],'STREAK_COUNT':[1, 0, 2, 0, 2, 0, 2, 0, 0, 0, 4, 1, 1]},索引=['1/2/1992'、'2/3/1992'、'3/2/1992'、'4/1/1992'、'5/1/1992'、'7/1/1992', '8/3/1992',10/1/1992"、11/2/1992"、1/4/1993"、2/1/1993"、3/1/1993"、4/1/1993"])>>>dfMON TOTAL STREAK_COUNT1/2/1992 1.123077 12/3/1992 -1.296178 03/2/1992 -6.355612 24/1/1992 5.634692 05/1/1992 4.180605 27/1/1992 -0.101016 08/3/1992 -0.706125 210/1/1992 0.368579 011/2/1992 3.822277 01/4/1993 2.233359 02/1/1993 15.219644 43/1/1993 -2.647693 14/1/1993 1.599094 1

接下来找到组,计算每个组的总和,并将结果加入原始数据帧:

<预><代码>>>>组 = df['STREAK_COUNT'][::-1].cumsum()[::-1]>>>df['RESULT'] = df.groupby(groups)['MON TOTAL'].transform('sum')>>>dfMON TOTAL STREAK_COUNT 个结果1/2/1992 1.123077 1 1.1230772/3/1992 -1.296178 0 -7.6517903/2/1992 -6.355612 2 -7.6517904/1/1992 5.634692 0 9.8152975/1/1992 4.180605 2 9.8152977/1/1992 -0.101016 0 -0.8071418/3/1992 -0.706125 2 -0.80714110/1/1992 0.368579 0 21.64385911/2/1992 3.822277 0 21.6438591/4/1993 2.233359 0 21.6438592/1/1993 15.219644 4 21.6438593/1/1993 -2.647693 1 -2.6476934/1/1993 1.599094 1 1.599094

如果您只想要每个连胜结束的结果,则使用掩码对其进行过滤:

<预><代码>>>>df[df['STREAK_COUNT'] >0]MON TOTAL STREAK_COUNT 个结果1/2/1992 1.123077 1 1.1230773/2/1992 -6.355612 2 -7.6517905/1/1992 4.180605 2 9.8152978/3/1992 -0.706125 2 -0.8071412/1/1993 15.219644 4 21.6438593/1/1993 -2.647693 1 -2.6476934/1/1993 1.599094 1 1.599094

I have three columns in a DataFrame. I want to take the number in the Streak_Count column and sum up that number of cells from the returns in the MON TOTAL. The result is displayed in the WANTED RESULT as shown below. The issue I cant figure out is summing the number of cells which can be any number>> in this example between 1 and 4.

              MON TOTAL STREAK_COUNT    WANTED RESULT
1/2/1992       1.123077       1          1.123077 (only 1 so 1.12)
2/3/1992      -1.296718       0 
3/2/1992      -6.355612       2          -7.65233 (sum of -1.29 and -6.35)
4/1/1992       5.634692       0 
5/1/1992       4.180605       2          9.815297 (sum of 5.63 and 4.18)
7/1/1992      -0.101016       0 
8/3/1992      -0.706125       2         -0.807141 (sum of -.10 and -.706)
10/1/1992      0.368579       0 
11/2/1992      3.822277       0 
1/4/1993       2.233359       0 
2/1/1993       15.219644      4         21.643859
3/1/1993       -2.647693      1         -2.647693
4/1/1993       1.599094       1         1.599094

解决方案

It's all about finding the right thing to group by. In this case, a reversed cumulative sum of STREAK_COUNT will give you what you want.

First we create the dataframe:

import pandas as pd

>>> df = pd.DataFrame({'MON TOTAL':[1.123077, -1.296178, -6.355612, 5.634692, 4.180605, -0.101016, -0.706125,
                                    0.368579, 3.822277, 2.233359, 15.219644, -2.647693, 1.599094],
                       'STREAK_COUNT':[1, 0, 2, 0, 2, 0, 2, 0, 0, 0, 4, 1, 1]},
                      index=['1/2/1992', '2/3/1992', '3/2/1992', '4/1/1992', '5/1/1992', '7/1/1992', '8/3/1992',
                             '10/1/1992', '11/2/1992', '1/4/1993', '2/1/1993', '3/1/1993', '4/1/1993'])
>>> df
           MON TOTAL  STREAK_COUNT
1/2/1992    1.123077             1
2/3/1992   -1.296178             0
3/2/1992   -6.355612             2
4/1/1992    5.634692             0
5/1/1992    4.180605             2
7/1/1992   -0.101016             0
8/3/1992   -0.706125             2
10/1/1992   0.368579             0
11/2/1992   3.822277             0
1/4/1993    2.233359             0
2/1/1993   15.219644             4
3/1/1993   -2.647693             1
4/1/1993    1.599094             1

Next find the groups, compute the sum of each group, and join the results to the original dataframe:

>>> groups = df['STREAK_COUNT'][::-1].cumsum()[::-1]
>>> df['RESULT'] = df.groupby(groups)['MON TOTAL'].transform('sum')
>>> df
           MON TOTAL  STREAK_COUNT     RESULT
1/2/1992    1.123077             1   1.123077
2/3/1992   -1.296178             0  -7.651790
3/2/1992   -6.355612             2  -7.651790
4/1/1992    5.634692             0   9.815297
5/1/1992    4.180605             2   9.815297
7/1/1992   -0.101016             0  -0.807141
8/3/1992   -0.706125             2  -0.807141
10/1/1992   0.368579             0  21.643859
11/2/1992   3.822277             0  21.643859
1/4/1993    2.233359             0  21.643859
2/1/1993   15.219644             4  21.643859
3/1/1993   -2.647693             1  -2.647693
4/1/1993    1.599094             1   1.599094

If you just want results for the end of each streak, then use a mask to filter it:

>>> df[df['STREAK_COUNT'] > 0]
          MON TOTAL  STREAK_COUNT     RESULT
1/2/1992   1.123077             1   1.123077
3/2/1992  -6.355612             2  -7.651790
5/1/1992   4.180605             2   9.815297
8/3/1992  -0.706125             2  -0.807141
2/1/1993  15.219644             4  21.643859
3/1/1993  -2.647693             1  -2.647693
4/1/1993   1.599094             1   1.599094

这篇关于在 pandas 数据框中的单个列中汇总一系列单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆