如何处理通过yfinance下载的多级列名? [英] How to deal with multi-level column names downloaded with yfinance?

查看:120
本文介绍了如何处理通过yfinance下载的多级列名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个股票行情清单( tickerStrings )列表,可以一次全部下载。当我尝试使用pandas的 read_csv 时,它没有像从yfinance下载数据那样读取csv文件。

I have a list of tickers (tickerStrings) that I to download all at once. When I try to use pandas' read_csv it doesn't read the csv file in the way it does when I download the data from yfinance.

我通常通过代码这样访问我的数据: data ['AAPL'] data ['AAPL']。关闭,但是当我从csv文件中读取数据时,我不允许这样做。

I usually access my data by ticker like this: data['AAPL'] or data['AAPL'].Close, but when I read the data from the csv file it does not let me do that.

if path.exists(data_file):
    data = pd.read_csv(data_file, low_memory=False)
    data = pd.DataFrame(data)
    print(data.head())
else:
    data = yf.download(tickerStrings, group_by="Ticker", period=prd, interval=intv)
    data.to_csv(data_file)

以下是打印输出:

                  Unnamed: 0                 OLN               OLN.1               OLN.2               OLN.3  ...                 W.1                 W.2                 W.3                 W.4     W.5
0                        NaN                Open                High                 Low               Close  ...                High                 Low               Close           Adj Close  Volume
1                   Datetime                 NaN                 NaN                 NaN                 NaN  ...                 NaN                 NaN                 NaN                 NaN     NaN
2  2020-06-25 09:30:00-04:00    11.1899995803833  11.220000267028809  11.010000228881836  11.079999923706055  ...   201.2899932861328   197.3000030517578  197.36000061035156  197.36000061035156  112156
3  2020-06-25 09:45:00-04:00  11.130000114440918  11.260000228881836  11.100000381469727   11.15999984741211  ...  200.48570251464844  196.47999572753906  199.74000549316406  199.74000549316406   83943
4  2020-06-25 10:00:00-04:00  11.170000076293945  11.220000267028809  11.119999885559082  11.170000076293945  ...  200.49000549316406  198.19000244140625   200.4149932861328   200.4149932861328   88771

错误I尝试时得到访问数据:

The error I get when trying to access the data:

Traceback (most recent call last):
File "getdata.py", line 49, in processData
    avg = data[x].Close.mean()
AttributeError: 'Series' object has no attribute 'Close'


推荐答案

将所有代码下载到具有单个级别列标题的单个数据框中


选项1



  • 下载单个股票行情收录器数据时,返回的数据框列名称为单个级别,但没有行情记录器列。

  • 这将下载每个行情自动收录器的数据,添加行情自动收录器列,并从所有所需的行情自动收录器创建单个数据框。

  • Download all tickers into single dataframe with single level column headers

    Option 1

    • When downloading single stock ticker data, the returned dataframe column names are a single level, but don't have a ticker column.
    • This will download data for each ticker, add a ticker column, and create a single dataframe from all desired tickers.
    • import yfinance as yf
      import pandas as pd
      
      tickerStrings = ['AAPL', 'MSFT']
      df_list = list()
      for ticker in tickerStrings:
          data = yf.download(ticker, group_by="Ticker", period='2d')
          data['ticker'] = ticker  # add this column becasue the dataframe doesn't contain a column with the ticker
          df_list.append(data)
      
      # combine all dataframes into a single dataframe
      df = pd.concat(df_list)
      
      # save to csv
      df.to_csv('ticker.csv')
      


      选项2



      • 下载所有股票并取消堆叠级别

        • group_by ='Ticker'将代码置于列名的 level = 0

        • Option 2

          • Download all the tickers and unstack the levels
            • group_by='Ticker' puts the ticker at level=0 of the column name
            • tickerStrings = ['AAPL', 'MSFT']
              df = yf.download(tickerStrings, group_by='Ticker', period='2d')
              df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
              




              读取 yfinance 已使用多级列名称存储的csv



              • 如果要保留,并读取包含多级文件的文件级列索引,请使用以下代码h会将数据框恢复为原始格式。


              • Read yfinance csv already stored with multi-level column names

                • If you wish to keep, and read in a file with a multi-level column index, use the following code, which will return the dataframe to its original form.
                • df = pd.read_csv('test.csv', header=[0, 1])
                  df.drop([0], axis=0, inplace=True)  # drop this row because it only has one column with Date in it
                  df[('Unnamed: 0_level_0', 'Unnamed: 0_level_1')] = pd.to_datetime(df[('Unnamed: 0_level_0', 'Unnamed: 0_level_1')], format='%Y-%m-%d')  # convert the first column to a datetime
                  df.set_index(('Unnamed: 0_level_0', 'Unnamed: 0_level_1'), inplace=True)  # set the first column as the index
                  df.index.name = None  # rename the index
                  



                  • 问题是 tickerStrings 是股票行情清单,其结果是最终的数据帧具有多级列名

                    • The issue is, tickerStrings is a list of tickers, which results in a final dataframe with multi-level column names
                    •                 AAPL                                                    MSFT                                
                                      Open      High       Low     Close Adj Close     Volume Open High Low Close Adj Close Volume
                      Date                                                                                                        
                      1980-12-12  0.513393  0.515625  0.513393  0.513393  0.405683  117258400  NaN  NaN NaN   NaN       NaN    NaN
                      1980-12-15  0.488839  0.488839  0.486607  0.486607  0.384517   43971200  NaN  NaN NaN   NaN       NaN    NaN
                      1980-12-16  0.453125  0.453125  0.450893  0.450893  0.356296   26432000  NaN  NaN NaN   NaN       NaN    NaN
                      1980-12-17  0.462054  0.464286  0.462054  0.462054  0.365115   21610400  NaN  NaN NaN   NaN       NaN    NaN
                      1980-12-18  0.475446  0.477679  0.475446  0.475446  0.375698   18362400  NaN  NaN NaN   NaN       NaN    NaN
                      



                      • 将其保存到csv时,它看起来像下面的示例,并导致出现数据框,就像您遇到问题一样。

                      • ,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,MSFT,MSFT,MSFT,MSFT,MSFT,MSFT
                        ,Open,High,Low,Close,Adj Close,Volume,Open,High,Low,Close,Adj Close,Volume
                        Date,,,,,,,,,,,,
                        1980-12-12,0.5133928656578064,0.515625,0.5133928656578064,0.5133928656578064,0.40568336844444275,117258400,,,,,,
                        1980-12-15,0.4888392984867096,0.4888392984867096,0.4866071343421936,0.4866071343421936,0.3845173120498657,43971200,,,,,,
                        1980-12-16,0.453125,0.453125,0.4508928656578064,0.4508928656578064,0.3562958240509033,26432000,,,,,,
                        




                        将多级列平整为一个级别并添加一个行情栏



                        • 如果股票代号为列名
                            $ b $的 level = 0 (顶部) b
                          • 使用 group_by ='Ticker'


                          • Flatten multi-level columns into a single level and add a ticker column

                            • If the ticker symbol is level=0 (top) of the column names
                              • When group_by='Ticker' is used
                              • df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
                                



                                • 如果股票代号为 level = 1 (底部)列名称

                                  • If the ticker symbol is level=1 (bottom) of the column names
                                  • df.stack(level=1).rename_axis(['Date', 'Ticker']).reset_index(level=1)
                                    




                                    下载每个股票代码并将其保存到单独的文件中



                                    • 我建议分别下载并保存每个股票代码,如下所示:

                                    • import yfinance as yf
                                      import pandas as pd
                                      
                                      tickerStrings = ['AAPL', 'MSFT']
                                      for ticker in tickerStrings:
                                          data = yf.download(ticker, group_by="Ticker", period=prd, interval=intv)
                                          data['ticker'] = ticker  # add this column becasue the dataframe doesn't contain a column with the ticker
                                          data.to_csv(f'ticker_{ticker}.csv')  # ticker_AAPL.csv for example
                                      



                                      • 数据看起来像

                                        • data will look like
                                        •                 Open      High       Low     Close  Adj Close      Volume ticker
                                          Date                                                                            
                                          1986-03-13  0.088542  0.101562  0.088542  0.097222   0.062205  1031788800   MSFT
                                          1986-03-14  0.097222  0.102431  0.097222  0.100694   0.064427   308160000   MSFT
                                          1986-03-17  0.100694  0.103299  0.100694  0.102431   0.065537   133171200   MSFT
                                          1986-03-18  0.102431  0.103299  0.098958  0.099826   0.063871    67766400   MSFT
                                          1986-03-19  0.099826  0.100694  0.097222  0.098090   0.062760    47894400   MSFT
                                          



                                          • 生成的csv看起来像

                                          • Date,Open,High,Low,Close,Adj Close,Volume,ticker
                                            1986-03-13,0.0885416641831398,0.1015625,0.0885416641831398,0.0972222238779068,0.0622050017118454,1031788800,MSFT
                                            1986-03-14,0.0972222238779068,0.1024305522441864,0.0972222238779068,0.1006944477558136,0.06442664563655853,308160000,MSFT
                                            1986-03-17,0.1006944477558136,0.1032986119389534,0.1006944477558136,0.1024305522441864,0.0655374601483345,133171200,MSFT
                                            1986-03-18,0.1024305522441864,0.1032986119389534,0.0989583358168602,0.0998263880610466,0.06387123465538025,67766400,MSFT
                                            1986-03-19,0.0998263880610466,0.1006944477558136,0.0972222238779068,0.0980902761220932,0.06276042759418488,47894400,MSFT
                                            


                                            读取上一节中保存的多个文件并创建一个数据框


                                            import pandas as pd
                                            from pathlib import Path
                                            
                                            # set the path to the files
                                            p = Path('c:/path_to_files')
                                            
                                            # find the files
                                            files = list(p.glob('ticker_*.csv'))
                                            
                                            # read the files into a dataframe
                                            df_list = list()
                                            for file in files:
                                                df_list.append(pd.read_csv(file))
                                            
                                            # combine dataframes
                                            df = pd.concat(df_list)
                                            

                                            这篇关于如何处理通过yfinance下载的多级列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆