需要根据数据框中的行号应用不同的公式 [英] Need to apply different formulas based on the row number in the dataframe

查看:105
本文介绍了需要根据数据框中的行号应用不同的公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力寻找数据帧中的某种移动平均值.该公式将根据要为其计算的行数而变化.实际情况是我需要计算Z列的地方.

I am working on finding some sort of moving average in a dataframe. The formula will change based on the number of the row it is being computed for. The actual scenario is where I need to compute column Z.

以下是我正在使用的实际数据

Below is the actual data I am working with

          Date     Open     High      Low    Close
0   01-01-2018  1763.95  1763.95  1725.00  1731.35
1   02-01-2018  1736.20  1745.80  1725.00  1743.20
2   03-01-2018  1741.10  1780.00  1740.10  1774.60
3   04-01-2018  1779.95  1808.00  1770.00  1801.35
4   05-01-2018  1801.10  1820.40  1795.60  1809.95
5   08-01-2018  1816.00  1827.95  1800.00  1825.00
6   09-01-2018  1823.00  1835.00  1793.90  1812.05
7   10-01-2018  1812.05  1823.00  1801.40  1816.55
8   11-01-2018  1825.00  1825.05  1798.55  1802.10
9   12-01-2018  1805.00  1820.00  1794.00  1804.95
10  15-01-2018  1809.90  1834.45  1792.45  1830.00
11  16-01-2018  1835.00  1857.45  1826.10  1850.25
12  17-01-2018  1850.00  1852.45  1826.20  1840.50
13  18-01-2018  1840.50  1852.00  1823.50  1839.00
14  19-01-2018  1828.25  1836.35  1811.00  1829.50
15  22-01-2018  1816.50  1832.55  1805.50  1827.20
16  23-01-2018  1825.00  1825.00  1782.25  1790.15
17  24-01-2018  1787.80  1792.70  1732.15  1737.50
18  25-01-2018  1739.90  1753.40  1720.00  1726.40
19  29-01-2018  1735.15  1754.95  1729.80  1738.70

我正在使用的代码段如下:

The code snippet I am using is as below:

from datetime import date
from nsepy import get_history
import csv
import pandas as pd
import numpy as np
import requests
from datetime import timedelta
import datetime as dt
import pandas_datareader.data as web
import io

df = pd.read_csv('ACC.CSV')

idx = df.reset_index().index

df['Change'] = df['Close'].diff()
df['Advance'] = np.where(df.Change > 0, df.Change,0)
df['Decline'] = np.where(df.Change < 0, df.Change*-1, 0)
conditions = [idx < 14, idx == 14, idx > 14]
values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)
df['Avg_Loss'] = (df.Decline.rolling(14).sum())/14
df['RS'] = df.Avg_Gain / df.Avg_Loss
df['RSI'] = np.where(df['Avg_Loss'] == 0, 100, 100-(100/(1+df.RS)))
df.drop(['Change', 'Advance', 'Decline', 'Avg_Gain', 'Avg_Loss', 'RS'],     axis=1)

print(df.head(20))

以下是我得到的错误:

Traceback (most recent call last):
  File "C:/Users/Lenovo/Desktop/Python/0.Chart Patterns/Z.Sample Code.py", line 20, in <module>
    values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'Avg_Gain'

以下是预期的输出,然后我还将写下公式.原始DF由日期打开高低和关闭.

Edit 3: Below is the expected output and then I will also write down the formula. Original DF consists of columns Date Open High Low & Close.

Advance and Decline – 
•   If the difference between current & previous is +ve then Advance = difference and Decline = 0
•   If the difference between current & previous is –ve then Advance = 0 and     Decline = -1 * difference
Avg_Gain:
•   If index is < 13 then Avg_Gain = 0
•   If index = 13 then Avg_Gain = Average of 14 periods
•   If index > 13, then Avg_Gain = (Avg_Gain(previous-row) * 13 +     Advance(current-row) )/14
Avg_Loss:
•   If index is < 13 then Avg_Loss = 0
•   If index = 13 then Avg_Loss = Average of Advance of 14 periods
•   If index > 13, then Avg_Loss = (Avg_Loss(previous-row) * 13 + Decline(current-row) )/14
RS:
•   If index < 13 then RS = 0
•   If Index >= 13 then RS = Avg_Gain/Avg_Loss
RSI = 100-(100/(1 + RS))

我希望这会有所帮助.

推荐答案

您的代码中有一个错误,因为在创建df.Avg_Gain时使用了df.Avg_Gain.是

You have an error in your code because you use df.Avg_Gain in creating df.Avg_Gain. Y

values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)

我将代码的这一部分更改为以下内容:

I changed that part of the code to the following:

up = df.Advance.rolling(14).sum()/14
values = [0, up, (up.shift(1) * 13 + df.Advance)/14]

输出(idx> = 14):

Output (idx>=14):

    Date        Open    High    Low     Close   RSI
14  2018-01-19  1828.25 1836.35 1811.00 1829.50 75.237850
15  2018-01-22  1816.50 1832.55 1805.50 1827.20 72.920021
16  2018-01-23  1825.00 1825.00 1782.25 1790.15 58.793750
17  2018-01-24  1787.80 1792.70 1732.15 1737.50 40.573938
18  2018-01-25  1739.90 1753.40 1720.00 1726.40 31.900045
19  2018-01-29  1735.15 1754.95 1729.80 1738.70 33.197678

尽管如此,应该有更好的方法.如果我找到一个更好的解决方案,我将对其进行更新.让我知道此数据是否正确.

There should be a better way of doing this though. I'll update it with a better solution if i find one. Let me know if this data is correct.

更新: 您还需要更正'Avg_loss` ::

UPDATE: You also need to correct your calculation for 'Avg_loss`::

down = df.Decline.rolling(14).sum()/14
down_values = [0, down, (down.shift(1) * 13 + df.Decline)/14]
df['Avg_Loss'] = np.select(conditions, down_values)

http://stockcharts.com/school/doku .php?id = chart_school:technical_indicators:relative_strength_index_rsi#calculation

更新2 :在提供了预期数据之后. 因此,我唯一可以做到这一点的方法就是循环-不确定是否可以这样做-也许我不知道某些熊猫功能.

UPDATE 2: After Expected data was provided. So the only way i could do this is by looping - not sure if possible to do it otherwise - maybe some pandas functionality i'm unaware of.

因此,首先通过设置Avg_GainAvg_Loss进行与以前相同的操作: 您只需要稍微更改一下值即可:

So first do the same as before with setting Avg_Gain and Avg_Loss: you just need to change the values slightly:

conditions = [idx<13, idx==13, idx>13]
up = df.Advance.rolling(14).sum()/14
values = [0, up, 0]
df['Avg_Gain'] = np.select(conditions, values)

down = df.Decline.rolling(14).sum()/14
down_values = [0, down, 0]
df['Avg_Loss'] = np.select(conditions, d_values)

我已更改您的条件以对索引13进行分割-因为这是基于预期输出的结果.

I have changed your conditions to split on index 13 - since this is what i see based on the expected output.

运行此代码后,将使用以前的Agv_GainAvg_Loss值从索引14填充Avg_GainAvg_Loss的值.

Once you run this code you will populate the values for Avg_Gain and Avg_Loss from index 14 using the previous value for Agv_Gain and Avg_Loss.

p=14
for i in range(p, len(df)):
    df.at[i, 'Avg_Gain'] = ((df.loc[i-1, 'Avg_Gain'] * (p-1)) + df.loc[i, 'Advance']) / p
    df.at[i, 'Avg_Loss'] = ((df.loc[i-1, 'Avg_Loss'] * (p-1)) + df.loc[i, 'Decline']) / p

输出:

df[13:][['Date','Avg_Gain', 'Avg_Loss', 'RS', 'RSI']]

    Date        Avg_Gain    Avg_Loss    RS          RSI
13  2018-01-18  10.450000   2.760714    3.785252    79.102460
14  2018-01-19  9.703571    3.242092    2.992997    74.956155
15  2018-01-22  9.010459    3.174800    2.838119    73.945571
16  2018-01-23  8.366855    5.594457    1.495562    59.928860
17  2018-01-24  7.769222    8.955567    0.867530    46.453335
18  2018-01-25  7.214278    9.108741    0.792017    44.196960
19  2018-01-29  7.577544    8.458116    0.895890    47.254330

这篇关于需要根据数据框中的行号应用不同的公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆