需要根据数据框中的行号应用不同的公式 [英] Need to apply different formulas based on the row number in the dataframe
问题描述
我正在努力寻找数据帧中的某种移动平均值.该公式将根据要为其计算的行数而变化.实际情况是我需要计算Z列的地方.
I am working on finding some sort of moving average in a dataframe. The formula will change based on the number of the row it is being computed for. The actual scenario is where I need to compute column Z.
以下是我正在使用的实际数据
Below is the actual data I am working with
Date Open High Low Close
0 01-01-2018 1763.95 1763.95 1725.00 1731.35
1 02-01-2018 1736.20 1745.80 1725.00 1743.20
2 03-01-2018 1741.10 1780.00 1740.10 1774.60
3 04-01-2018 1779.95 1808.00 1770.00 1801.35
4 05-01-2018 1801.10 1820.40 1795.60 1809.95
5 08-01-2018 1816.00 1827.95 1800.00 1825.00
6 09-01-2018 1823.00 1835.00 1793.90 1812.05
7 10-01-2018 1812.05 1823.00 1801.40 1816.55
8 11-01-2018 1825.00 1825.05 1798.55 1802.10
9 12-01-2018 1805.00 1820.00 1794.00 1804.95
10 15-01-2018 1809.90 1834.45 1792.45 1830.00
11 16-01-2018 1835.00 1857.45 1826.10 1850.25
12 17-01-2018 1850.00 1852.45 1826.20 1840.50
13 18-01-2018 1840.50 1852.00 1823.50 1839.00
14 19-01-2018 1828.25 1836.35 1811.00 1829.50
15 22-01-2018 1816.50 1832.55 1805.50 1827.20
16 23-01-2018 1825.00 1825.00 1782.25 1790.15
17 24-01-2018 1787.80 1792.70 1732.15 1737.50
18 25-01-2018 1739.90 1753.40 1720.00 1726.40
19 29-01-2018 1735.15 1754.95 1729.80 1738.70
我正在使用的代码段如下:
The code snippet I am using is as below:
from datetime import date
from nsepy import get_history
import csv
import pandas as pd
import numpy as np
import requests
from datetime import timedelta
import datetime as dt
import pandas_datareader.data as web
import io
df = pd.read_csv('ACC.CSV')
idx = df.reset_index().index
df['Change'] = df['Close'].diff()
df['Advance'] = np.where(df.Change > 0, df.Change,0)
df['Decline'] = np.where(df.Change < 0, df.Change*-1, 0)
conditions = [idx < 14, idx == 14, idx > 14]
values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)
df['Avg_Loss'] = (df.Decline.rolling(14).sum())/14
df['RS'] = df.Avg_Gain / df.Avg_Loss
df['RSI'] = np.where(df['Avg_Loss'] == 0, 100, 100-(100/(1+df.RS)))
df.drop(['Change', 'Advance', 'Decline', 'Avg_Gain', 'Avg_Loss', 'RS'], axis=1)
print(df.head(20))
以下是我得到的错误:
Traceback (most recent call last):
File "C:/Users/Lenovo/Desktop/Python/0.Chart Patterns/Z.Sample Code.py", line 20, in <module>
values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'Avg_Gain'
以下是预期的输出,然后我还将写下公式.原始DF由日期打开高低和关闭.
Edit 3: Below is the expected output and then I will also write down the formula. Original DF consists of columns Date Open High Low & Close.
Advance and Decline –
• If the difference between current & previous is +ve then Advance = difference and Decline = 0
• If the difference between current & previous is –ve then Advance = 0 and Decline = -1 * difference
Avg_Gain:
• If index is < 13 then Avg_Gain = 0
• If index = 13 then Avg_Gain = Average of 14 periods
• If index > 13, then Avg_Gain = (Avg_Gain(previous-row) * 13 + Advance(current-row) )/14
Avg_Loss:
• If index is < 13 then Avg_Loss = 0
• If index = 13 then Avg_Loss = Average of Advance of 14 periods
• If index > 13, then Avg_Loss = (Avg_Loss(previous-row) * 13 + Decline(current-row) )/14
RS:
• If index < 13 then RS = 0
• If Index >= 13 then RS = Avg_Gain/Avg_Loss
RSI = 100-(100/(1 + RS))
我希望这会有所帮助.
推荐答案
您的代码中有一个错误,因为在创建df.Avg_Gain
时使用了df.Avg_Gain
.是
You have an error in your code because you use df.Avg_Gain
in creating df.Avg_Gain
. Y
values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)
我将代码的这一部分更改为以下内容:
I changed that part of the code to the following:
up = df.Advance.rolling(14).sum()/14
values = [0, up, (up.shift(1) * 13 + df.Advance)/14]
输出(idx> = 14):
Output (idx>=14):
Date Open High Low Close RSI
14 2018-01-19 1828.25 1836.35 1811.00 1829.50 75.237850
15 2018-01-22 1816.50 1832.55 1805.50 1827.20 72.920021
16 2018-01-23 1825.00 1825.00 1782.25 1790.15 58.793750
17 2018-01-24 1787.80 1792.70 1732.15 1737.50 40.573938
18 2018-01-25 1739.90 1753.40 1720.00 1726.40 31.900045
19 2018-01-29 1735.15 1754.95 1729.80 1738.70 33.197678
尽管如此,应该有更好的方法.如果我找到一个更好的解决方案,我将对其进行更新.让我知道此数据是否正确.
There should be a better way of doing this though. I'll update it with a better solution if i find one. Let me know if this data is correct.
更新: 您还需要更正'Avg_loss` ::
UPDATE: You also need to correct your calculation for 'Avg_loss`::
down = df.Decline.rolling(14).sum()/14
down_values = [0, down, (down.shift(1) * 13 + df.Decline)/14]
df['Avg_Loss'] = np.select(conditions, down_values)
更新2 :在提供了预期数据之后. 因此,我唯一可以做到这一点的方法就是循环-不确定是否可以这样做-也许我不知道某些熊猫功能.
UPDATE 2: After Expected data was provided. So the only way i could do this is by looping - not sure if possible to do it otherwise - maybe some pandas functionality i'm unaware of.
因此,首先通过设置Avg_Gain
和Avg_Loss
进行与以前相同的操作:
您只需要稍微更改一下值即可:
So first do the same as before with setting Avg_Gain
and Avg_Loss
:
you just need to change the values slightly:
conditions = [idx<13, idx==13, idx>13]
up = df.Advance.rolling(14).sum()/14
values = [0, up, 0]
df['Avg_Gain'] = np.select(conditions, values)
down = df.Decline.rolling(14).sum()/14
down_values = [0, down, 0]
df['Avg_Loss'] = np.select(conditions, d_values)
我已更改您的条件以对索引13进行分割-因为这是基于预期输出的结果.
I have changed your conditions to split on index 13 - since this is what i see based on the expected output.
运行此代码后,将使用以前的Agv_Gain
和Avg_Loss
值从索引14
填充Avg_Gain
和Avg_Loss
的值.
Once you run this code you will populate the values for Avg_Gain
and Avg_Loss
from index 14
using the previous value for Agv_Gain
and Avg_Loss
.
p=14
for i in range(p, len(df)):
df.at[i, 'Avg_Gain'] = ((df.loc[i-1, 'Avg_Gain'] * (p-1)) + df.loc[i, 'Advance']) / p
df.at[i, 'Avg_Loss'] = ((df.loc[i-1, 'Avg_Loss'] * (p-1)) + df.loc[i, 'Decline']) / p
输出:
df[13:][['Date','Avg_Gain', 'Avg_Loss', 'RS', 'RSI']]
Date Avg_Gain Avg_Loss RS RSI
13 2018-01-18 10.450000 2.760714 3.785252 79.102460
14 2018-01-19 9.703571 3.242092 2.992997 74.956155
15 2018-01-22 9.010459 3.174800 2.838119 73.945571
16 2018-01-23 8.366855 5.594457 1.495562 59.928860
17 2018-01-24 7.769222 8.955567 0.867530 46.453335
18 2018-01-25 7.214278 9.108741 0.792017 44.196960
19 2018-01-29 7.577544 8.458116 0.895890 47.254330
这篇关于需要根据数据框中的行号应用不同的公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!