Pandas或Pyspark数据框中连续列的实际差异和百分比差异 [英] Actual and Percentage Difference on consecutive columns in a Pandas or Pyspark Dataframe
本文介绍了Pandas或Pyspark数据框中连续列的实际差异和百分比差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想对pandas或pyspark数据框中的连续列执行两种不同的计算.
I would like to perform two different calculations across consecutive columns in a pandas or pyspark dataframe.
列为星期,指标显示为行. 我想计算各列的实际差异和百分比差异.
Columns are weeks and the metrics are displayed as rows. I want to calculate the actual and percentage differences across the columns.
输入/输出表,包括下图显示了Excel中使用的计算. 我想将这些计算结果复制到pandas或pyspark数据框中.
The input/output tables incl. the calculations used in Excel are displayed in the following image. I want to replicate these calculations on a pandas or pyspark dataframe.
附带的原始数据:
Metrics Week20 Week21 Week22 Week23 Week24 Week25 Week26 Week27
Sales 20301 21132 20059 23062 19610 22734 22140 20699
TRXs 739 729 690 779 701 736 762 655
Attachment Rate 4.47 4.44 4.28 4.56 4.41 4.58 4.55 4.96
AOV 27.47 28.99 29.07 29.6 27.97 30.89 29.06 31.6
Profit 5177 5389 5115 5881 5001 5797 5646 5278
Profit per TRX 7.01 7.39 7.41 7.55 7.13 7.88 7.41 8.06
推荐答案
在熊猫中,您可以使用 diff(axis = 1)方法:
in pandas you could use pct_change(axis=1) and diff(axis=1) methods:
df = df.set_index('Metrics')
# list of metrics with "actual diff"
actual = ['AOV', 'Attachment Rate']
rep = (df[~df.index.isin(actual)].pct_change(axis=1).round(2)*100).fillna(0).astype(str).add('%')
rep = pd.concat([rep,
df[df.index.isin(actual)].diff(axis=1).fillna(0)
])
In [131]: rep
Out[131]:
Week20 Week21 Week22 Week23 Week24 Week25 Week26 Week27
Metrics
Sales 0.0% 4.0% -5.0% 15.0% -15.0% 16.0% -3.0% -7.0%
TRXs 0.0% -1.0% -5.0% 13.0% -10.0% 5.0% 4.0% -14.0%
Profit 0.0% 4.0% -5.0% 15.0% -15.0% 16.0% -3.0% -7.0%
Profit per TRX 0.0% 5.0% 0.0% 2.0% -6.0% 11.0% -6.0% 9.0%
Attachment Rate 0 -0.03 -0.16 0.28 -0.15 0.17 -0.03 0.41
AOV 0 1.52 0.08 0.53 -1.63 2.92 -1.83 2.54
这篇关于Pandas或Pyspark数据框中连续列的实际差异和百分比差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文