pandas 条件累积总和 [英] Pandas conditional cumulative sum

查看:77
本文介绍了 pandas 条件累积总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理之间具有空值的数据.我打算从特定列sales的累加总和中构建时间序列图. sales上的累积总和的条件:(1.)如果第一行为空,则为fillna(0),然后为cumsum(),因此图始终可以从原点开始. (2.)如果空行紧随其后,请保留为空,否则fillna(0):

I'm working with data that has null values in between. I intend to construct a time-series plot out of cumulative sum of a particular column sales. Conditions for cumulative sum on sales: (1.) if first row is null, fillna(0), then cumsum() so plot can always start from origin. (2.) if null rows follow each other to the end, leave as null else fillna(0):

data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
        'quantity': [10, 21, 20, 10, 39, 30, 31,45, 23, 56],
        'sales': [None, 41, None, None, 32, 0, 31,None, None, None]}

df = pd.DataFrame(data)
df = df.set_index('year')

df['cum_sales'] = df[['sales']].cumsum()

print df
df.plot()

如何应用条件以使结果变为:

How to apply conditions such that result becomes:

推荐答案

我们将使用bfillcombine_first的组合.当我们回填销售列时,我们会填写所有缺失值,后跟非缺失数据,从而满足条件(2).但是,我仅用它来确定要做的条件(2),因为我将这些乘以零,然后用它来填写缺失的值.我实际上并不想回填数据. combine_first首先从第一个数据帧中获取值,如果缺少该值,它将尝试从第二个数据帧中获取它.因此,不满足条件2的缺失值将填充为零,否则将保持缺失.

We're going to use a combination of bfill and combine_first. As we backfill the sales column, we fill in any missing values that are followed by non-missing data, thus satisfying condition (2). However, I use this only to identify those positions that do satisfy condition (2) because I'm then going to multiply these by zero and use it to fill in missing values. I didn't actually want to backfill the data. combine_first takes the value from the first dataframe first, if its missing it will attempt to get it from the second dataframe. Therefore the missing values that don't satisfy condition 2 will get filled with zero and otherwise will remain missing.

很多话题:

df.sales = df.sales.combine_first(df.sales.bfill() * 0)

df.cum_sales = df.sales.cumsum()

print df

      quantity  sales  cum_sales
year                            
2010        10    0.0        0.0
2011        21   41.0       41.0
2012        20    0.0       41.0
2013        10    0.0       41.0
2014        39   32.0       73.0
2015        30    0.0       73.0
2016        31   31.0      104.0
2017        45    NaN        NaN
2018        23    NaN        NaN
2019        56    NaN        NaN

剧情:

df.plot()

这篇关于 pandas 条件累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆