如何对数据框的多列执行单个操作 [英] How to perform single operation on Multiple columns of Dataframe

查看:62
本文介绍了如何对数据框的多列执行单个操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

df
>>>                                     TSLA             MSFT
2017-05-15 00:00:00+00:00                320              68
2017-05-16 00:00:00+00:00                319              69
2017-05-17 00:00:00+00:00                314              61
2017-05-18 00:00:00+00:00                313              66
2017-05-19 00:00:00+00:00                316              62
2017-05-22 00:00:00+00:00                314              65
2017-05-23 00:00:00+00:00                310              63


max_idx = df.idxmax() # returns index of max value
>>> TSLA    2017-05-15 00:00:00+00:00
>>> MSFT    2017-05-16 00:00:00+00:00

max_value = df.max() # returns max value
>>> TSLA = 320
>>> MSFT = 69

def pct_change(first, second):  # pct chg formula
    return (second-first) / first*100.00

我想在max_value和两列的每个连续值都从max_idx(df.loc[max_idx:])开始的情况下获得百分比变化.只是为了确保 百分比变化不低于5%.

I want to get percent change between max_value and with each consecutive value starting from max_idx (df.loc[max_idx:]) for both columns. Just to ensure that, the percent change is not below 5%.

Example: 
for TSLA:  320 with 319 = 2%       for MSFT: 69 with 61 = 4%
           320 with 314 = 4%                 69 with 66 = 5% 
           320 with 313 = 5%                 69 with 62 = 10%

编辑:如果您觉得很难回答,那么我仅对要用于此类操作的函数或方法的类型感到满意.

If you find it difficult to answer, i can be happy with just a reference to what type of function or method i shall use for such operations.

注意:我只想确保百分比变化不低于5%.

Note: I just want to ensure that percent change isn't below 5%.

推荐答案

我不确定您的正确/错误条件,但由于@JohnGalt,我想您需要类似的东西:

I am not sure about your true/false conditions, but I think you need something like this, thanks to @JohnGalt:

df.apply(lambda x: ((1 - x/x.max()) > 0.05).all())

或使用您的逻辑:

df.apply(lambda x: ((x[x.idxmax()]-x)/x[x.idxmax()]*100>5).all())

输出:

TSLA    False
MSFT    False
dtype: bool

我们来看一列,

约翰的公式:

1 - df.TSLA/df.TSLA.max()

返回:

2017-05-15 00:00:00+00:00    0.000000
2017-05-16 00:00:00+00:00    0.003125
2017-05-17 00:00:00+00:00    0.018750
2017-05-18 00:00:00+00:00    0.021875
2017-05-19 00:00:00+00:00    0.012500
2017-05-22 00:00:00+00:00    0.018750
2017-05-23 00:00:00+00:00    0.031250
Name: TSLA, dtype: float64

如果所有这些值均大于5,则返回True,否则返回False.

If all of those values are greater than 5 return True, else return False.

我的原始公式也可以工作,只需要更多的计算即可完成与John公式相同的操作. 最后,使用lambda函数将此公式独立应用于每个列.

My original formula works also, just a bit more calculation to do the same thing that John formula does. Lastly, use lambda function to apply this formula to each column independently.

这篇关于如何对数据框的多列执行单个操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆