如何在数据框中合并连续数据并增加价值 [英] How to combine consecutive data in a dataframe and add up value

查看:60
本文介绍了如何在数据框中合并连续数据并增加价值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框:

 Type:  Volume: Date:
 Q     10      2016.6.1
 Q     20      2016.6.1 
 T     10      2016.6.2 
 Q     10      2016.6.3
 T     20      2016.6.4
 T     20      2016.6.5
 Q     10      2016.6.6

请注意,两个连续的T的日期不同,我想取第一个日期

Note that the date for the two consecutive T's are different, and I want to take the first date

并且我想将T型组合到一行并仅在两个(或多个)T连续的情况下累加音量

and I want to combine type T to one row and add up volume only if two(or more) Ts are consecutive

即到:

 Q     10      2016.6.1
 Q     20      2016.6.1 
 T     10      2016.6.2 
 Q     10      2016.6.3
 T     20+20=40 2016.6.4
 Q     10      2016.6.6

我现在使用的代码是:

df.groupby(by = [df.Type.ne('T').cumsum(),'Price', 'Time', 'Type'], as_index = False)['Volume'].sum()

但是,此代码仅在连续Ts的日期相同时才有效.您知道如何将具有不同日期的连续T组合在一起,而只采用第一个日期吗?

However, this code only works when the date of the consecutive Ts are the same. Do you know how to combine consecutive T with different date, and only take the first date?

推荐答案

import numpy as np
import pandas as pd

df = pd.DataFrame({"Type":   ["Q", "Q", "T", "Q", "T", "T", "Q"],
                   "Volume": [10,   20,  10,  10,  20,  20,  10],
                   "Date":   ["2016-06-01", "2016-06-01", "2016-06-02", "2016-06-03",
                              "2016-06-04", "2016-06-05", "2016-06-06"]})
df["Date"] = pd.to_datetime(df["Date"])

res = df.groupby(by = [df.Type.ne('T').cumsum(), 'Type'], as_index=False).agg({'Volume': 'sum', 'Date': 'first'})
print(res)

输出:

  Type       Date  Volume
0    Q 2016-06-01      10
1    Q 2016-06-01      20
2    T 2016-06-02      10
3    Q 2016-06-03      10
4    T 2016-06-04      40
5    Q 2016-06-06      10

这篇关于如何在数据框中合并连续数据并增加价值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆