将多行的当前行与上一行的差异追加到新列中 [英] Appending to a new column the differences of current row and previous row, for multiple columns
问题描述
对于我df中的每一列,我想从上一行(row [n + 1] -row [n])中减去当前行,但是遇到了困难.
For each of the columns in my df, I want to subtract the current row from the previous row (row[n+1]-row[n]), but I am having difficulty.
我的代码如下:
#!/usr/bin/python3
from pandas_datareader import data
import pandas as pd
import fix_yahoo_finance as yf
yf.pdr_override()
import os
stock_list = ["BHP.AX", "CBA.AX", "RHC.AX", "TLS.AX", "WOW.AX", "^AORD"]
# Function to get the closing price of the individual stocks
# from the stock_list list
def get_closing_price(stock_name, specific_close):
symbol = stock_name
start_date = '2010-01-01'
end_date = '2016-06-01'
df = data.get_data_yahoo(symbol, start_date, end_date)
sym = symbol + " "
print(sym * 10)
df = df.drop(['Open', 'High', 'Low', 'Adj Close', 'Volume'], axis=1)
df = df.rename(columns={'Close': specific_close})
# https://stackoverflow.com/questions/16729483/converting-strings-to-floats-in-a-dataframe
# df[specific_close] = df[specific_close].astype('float64')
print(type(df[specific_close]))
return df
# Creates a big DataFrame with all the stock's Closing
# Price returns the DataFrame
def get_all_close_prices(directory):
count = 0
for stock_name in stock_list:
specific_close = stock_name.replace(".AX", "") + "_Close"
if not count:
prev_df = get_closing_price(stock_name, specific_close)
else:
new_df = get_closing_price(stock_name, specific_close)
# https://stackoverflow.com/questions/11637384/pandas-join-merge-concat-two-dataframes
prev_df = prev_df.join(new_df)
count += 1
prev_df.to_csv(directory)
return prev_df
# THIS IS THE FUNCTION I NEED HELP WITH
# AS DESCRIBED IN THE QUESTION
def calculate_return(df):
count = 0
# for index, row in df.iterrows():
print(df.columns[0])
for stock in stock_list:
specific_close = stock.replace(".AX", "") + "_Close"
print(specific_close)
# https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas
pd.to_numeric(specific_close, errors='ignore')
df.columns[count].diff()
count += 1
return df
def main():
# FINDS THE CURRENT DIRECTORY AND CREATES THE CSV TO DUMP THE DF
csv_in_current_directory = os.getcwd() + "/stk_output.csv"
# FUNCTION THAT GETS ALL THE CLOSING PRICES OF THE STOCKS
# AND RETURNS IT AS ONE COMPLETE DATAFRAME
df = get_all_close_prices(csv_in_current_directory)
# THIS PRINTS OUT WHAT IS IN "OUTPUT 1"
print(df)
# THIS FUNCTION IS WHERE I HAVE THE PROBLEM
df = calculate_return(df)
# THIS SHOULD PRINT OUT WHAT IS IN "EXPECTED OUTPUT"
print(df)
# Main line of code
if __name__ == "__main__":
main()
问题:
对于每列,我想从上一行(row [n + 1] -row [n])中减去当前行,并将此值分配给数据帧末尾的新列作为新列作为stock_name + "_Earning"
.我的预期输出(请参阅:预期的输出)是,我仍然具有原始的df
,如在输出1 中看到的那样,但是还有6列,其中有一个空第一行,以及相应列中行(row [n + 1] -row [n])之间的差异.
For each of the columns, I would like subtract current row from the previous row (row[n+1]-row[n]) and assign this value to a new column at the end of the dataframe as a new column as stock_name + "_Earning"
. My expected output (see: Expected Output) is that I still have the original df
as seen in Output 1, but has 6 additional columns, with an empty first row, and the differences of the rows (row[n+1]-row[n]) therein in the respective column.
面临的问题:
使用当前代码-我遇到以下错误,我试图摆脱该错误
With the current code - I am getting the following error, which I have tried to get rid of
AttributeError:'str'对象没有属性'diff'
AttributeError: 'str' object has no attribute 'diff'
我尝试过的事情:
我尝试过的一些事情:
- Change data type of columns in Pandas
- numpy.diff
- Data-frame Object has no Attribute
- How do I subtract the previous row from the current row in a pandas dataframe and apply it to every row; without using a loop?
- pandas.DataFrame.diff
预期输出:
BHP_Close CBA_Close RHC_Close TLS_Close WOW_Close ^AORD BHP_Earning CBA_Earning RHC_Earning TLS_Earning WOW_Earning ^AORD_Earning
Date
2010-01-03 40.255699 54.574299 11.240000 3.45 27.847300 4889.799805
2010-01-04 40.442600 55.399799 11.030000 3.44 27.679100 4939.500000 0.186901 0.8255 -0.21 -0.01 -0.1682 49.70020000
输出1:
BHP_Close CBA_Close RHC_Close TLS_Close WOW_Close ^AORD_Close
Date
2010-01-03 40.255699 54.574299 11.240000 3.45 27.847300 4889.799805
2010-01-04 40.442600 55.399799 11.030000 3.44 27.679100 4939.500000
2010-01-05 40.947201 55.678299 11.180000 3.38 27.629601 4946.799805
... ... ... ... ... ... ...
2016-05-30 19.240000 78.180000 72.730003 5.67 22.389999 5473.600098
2016-05-31 19.080000 77.430000 72.750000 5.59 22.120001 5447.799805
2016-06-01 18.490000 76.500000 72.150002 5.52 21.799999 5395.200195
推荐答案
这是一种轻松快捷的方法来完成您想要的事情:
Here is an easy and quick way to do what you want:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(25).reshape(5, 5),
columns=['A', 'B', 'C', 'D', 'E'])
print(df)
结果:
A B C D E
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
4 20 21 22 23 24
我们可以使用shift成员函数向上(或向下)移动整个数据帧. 然后,我们只需要从原始文件中减去它,然后重命名列即可.
We can use the shift member function to move the entire dataframe up (or down). Then we just have to subtract this from the original, and rename the columns.
df2 = df - df.shift(1, axis=0)
df2.columns = [col + '_earning' for col in df2.columns]
print(df2)
结果:
A_earning B_earning C_earning D_earning E_earning
0 NaN NaN NaN NaN NaN
1 5.0 5.0 5.0 5.0 5.0
2 5.0 5.0 5.0 5.0 5.0
3 5.0 5.0 5.0 5.0 5.0
4 5.0 5.0 5.0 5.0 5.0
然后将结果与原始结果结合在一起.
Then just join the result with the original.
result = pd.concat([df, df2], axis=1)
print(result)
结果:
A B C D E A_earning B_earning C_earning D_earning E_earning
0 0 1 2 3 4 NaN NaN NaN NaN NaN
1 5 6 7 8 9 5.0 5.0 5.0 5.0 5.0
2 10 11 12 13 14 5.0 5.0 5.0 5.0 5.0
3 15 16 17 18 19 5.0 5.0 5.0 5.0 5.0
4 20 21 22 23 24 5.0 5.0 5.0 5.0 5.0
edit:重新查看您的帖子后,您似乎想对包含字符串的某些列执行此操作?要么将它们过滤掉,要么转换为支持'-'运算符的数据类型.
edit: after revisiting your post, it looks like your trying to do this operation on some columns containing strings? Either filter them out or convert to a datatype which supports the '-' operator.
这篇关于将多行的当前行与上一行的差异追加到新列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!