Pandas或Pyspark数据框中连续列的实际差异和百分比差异 [英] Actual and Percentage Difference on consecutive columns in a Pandas or Pyspark Dataframe

查看:165
本文介绍了Pandas或Pyspark数据框中连续列的实际差异和百分比差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对pandas或pyspark数据框中的连续列执行两种不同的计算.

I would like to perform two different calculations across consecutive columns in a pandas or pyspark dataframe.

列为星期,指标显示为行. 我想计算各列的实际差异和百分比差异.

Columns are weeks and the metrics are displayed as rows. I want to calculate the actual and percentage differences across the columns.

输入/输出表,包括下图显示了Excel中使用的计算. 我想将这些计算结果复制到pandas或pyspark数据框中.

The input/output tables incl. the calculations used in Excel are displayed in the following image. I want to replicate these calculations on a pandas or pyspark dataframe.

附带的原始数据:

Metrics         Week20  Week21  Week22  Week23  Week24  Week25  Week26  Week27
Sales           20301   21132   20059   23062   19610   22734   22140   20699
TRXs            739     729     690     779     701     736     762     655
Attachment Rate 4.47    4.44    4.28    4.56    4.41    4.58    4.55    4.96
AOV             27.47   28.99   29.07   29.6    27.97   30.89   29.06   31.6
Profit          5177    5389    5115    5881    5001    5797    5646    5278
Profit per TRX  7.01    7.39    7.41    7.55    7.13    7.88    7.41    8.06

推荐答案

在熊猫中,您可以使用

in pandas you could use pct_change(axis=1) and diff(axis=1) methods:

df = df.set_index('Metrics')

# list of metrics with "actual diff"
actual = ['AOV', 'Attachment Rate']

rep = (df[~df.index.isin(actual)].pct_change(axis=1).round(2)*100).fillna(0).astype(str).add('%')
rep = pd.concat([rep,
                 df[df.index.isin(actual)].diff(axis=1).fillna(0)
                ])


In [131]: rep
Out[131]:
                Week20 Week21 Week22 Week23  Week24 Week25 Week26  Week27
Metrics
Sales             0.0%   4.0%  -5.0%  15.0%  -15.0%  16.0%  -3.0%   -7.0%
TRXs              0.0%  -1.0%  -5.0%  13.0%  -10.0%   5.0%   4.0%  -14.0%
Profit            0.0%   4.0%  -5.0%  15.0%  -15.0%  16.0%  -3.0%   -7.0%
Profit per TRX    0.0%   5.0%   0.0%   2.0%   -6.0%  11.0%  -6.0%    9.0%
Attachment Rate      0  -0.03  -0.16   0.28   -0.15   0.17  -0.03    0.41
AOV                  0   1.52   0.08   0.53   -1.63   2.92  -1.83    2.54

这篇关于Pandas或Pyspark数据框中连续列的实际差异和百分比差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆