pandas 条件累积总和 [英] Pandas conditional cumulative sum
问题描述
我正在处理之间具有空值的数据.我打算从特定列sales
的累加总和中构建时间序列图. sales
上的累积总和的条件:(1.)如果第一行为空,则为fillna(0)
,然后为cumsum()
,因此图始终可以从原点开始. (2.)如果空行紧随其后,请保留为空,否则fillna(0)
:
I'm working with data that has null values in between. I intend to construct a time-series plot out of cumulative sum of a particular column sales
. Conditions for cumulative sum on sales
: (1.) if first row is null, fillna(0)
, then cumsum()
so plot can always start from origin. (2.) if null rows follow each other to the end, leave as null else fillna(0)
:
data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
'quantity': [10, 21, 20, 10, 39, 30, 31,45, 23, 56],
'sales': [None, 41, None, None, 32, 0, 31,None, None, None]}
df = pd.DataFrame(data)
df = df.set_index('year')
df['cum_sales'] = df[['sales']].cumsum()
print df
df.plot()
如何应用条件以使结果变为:
How to apply conditions such that result becomes:
推荐答案
我们将使用bfill
和combine_first
的组合.当我们回填销售列时,我们会填写所有缺失值,后跟非缺失数据,从而满足条件(2).但是,我仅用它来确定要做的条件(2),因为我将这些乘以零,然后用它来填写缺失的值.我实际上并不想回填数据. combine_first
首先从第一个数据帧中获取值,如果缺少该值,它将尝试从第二个数据帧中获取它.因此,不满足条件2的缺失值将填充为零,否则将保持缺失.
We're going to use a combination of bfill
and combine_first
. As we backfill the sales column, we fill in any missing values that are followed by non-missing data, thus satisfying condition (2). However, I use this only to identify those positions that do satisfy condition (2) because I'm then going to multiply these by zero and use it to fill in missing values. I didn't actually want to backfill the data. combine_first
takes the value from the first dataframe first, if its missing it will attempt to get it from the second dataframe. Therefore the missing values that don't satisfy condition 2 will get filled with zero and otherwise will remain missing.
很多话题:
df.sales = df.sales.combine_first(df.sales.bfill() * 0)
df.cum_sales = df.sales.cumsum()
print df
quantity sales cum_sales
year
2010 10 0.0 0.0
2011 21 41.0 41.0
2012 20 0.0 41.0
2013 10 0.0 41.0
2014 39 32.0 73.0
2015 30 0.0 73.0
2016 31 31.0 104.0
2017 45 NaN NaN
2018 23 NaN NaN
2019 56 NaN NaN
剧情:
df.plot()
这篇关于 pandas 条件累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!