如何在数据框中取出列索引名称 [英] How to take out the column index name in dataframe
问题描述
Open High Low Close Volume Adj Close
Date
1990-01-02 00:00:00 35.25 37.50 35.00 37.25 6555600 8.70
1990-01-03 00:00:00 38.00 38.00 37.50 37.50 7444400 8.76
1990-01-04 00:00:00 38.25 38.75 37.25 37.63 7928800 8.79
1990-01-05 00:00:00 37.75 38.25 37.00 37.75 4406400 8.82
1990-01-08 00:00:00 37.50 38.00 37.00 38.00 3643200 8.88
如何摆脱上面dataframe
中的日期索引名称?它应该与其他列名称位于同一行,但不是,这会引起问题.
How can I get rid of the Date index name in the above dataframe
? It should be in the same row as the other column names but its not which is causing problems.
谢谢
推荐答案
简短的回答:您不能,而且不清楚为什么会引起问题". "Date"名称用于命名DataFrame的索引,该索引与任何列均不同.它专门为此偏移量打印,因此您不会将其与框架的列混淆.您不会使用DataFrame['Date']
如下所示分割日期:
Short answer: you can't and it's not clear why this could ever "cause problems". The 'Date' name is naming the Index of the DataFrame, which is different from any of the columns. It gets printed with this offset specifically so you will not confuse it with a column of the frame. You would not slice into the date with DataFrame['Date']
as per below:
>>> import numpy as np; import pandas; import datetime
>>> dfrm = pandas.DataFrame(np.random.rand(10,3),
... columns=['A','B','C'],
... index = pandas.Index(
... [datetime.date(2012,6,elem) for elem in range(1,11)],
... name="Date"))
>>> dfrm
A B C
Date
2012-06-01 0.283724 0.863012 0.798891
2012-06-02 0.097231 0.277564 0.872306
2012-06-03 0.821461 0.499485 0.126441
2012-06-04 0.887782 0.389486 0.374118
2012-06-05 0.248065 0.032287 0.850939
2012-06-06 0.101917 0.121171 0.577643
2012-06-07 0.225278 0.161301 0.708996
2012-06-08 0.906042 0.828814 0.247564
2012-06-09 0.733363 0.924076 0.393353
2012-06-10 0.273837 0.318013 0.754807
>>> dfrm['Date']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1458, in __getitem__
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 294, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 625, in get
_, block = self._find_block(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 715, in _find_block
self._check_have(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 722, in _check_have
raise KeyError('no item named %s' % str(item))
KeyError: 'no item named Date'
更长的答案:
您可以通过将索引添加到自己的列中来更改DataFrame(如果您希望它以这种方式打印).例如:
You can change your DataFrame by adding the index into its own column if you'd like it to print that way. For example:
>>> dfrm['Date'] = dfrm.index
>>> dfrm
A B C Date
Date
2012-06-01 0.283724 0.863012 0.798891 2012-06-01
2012-06-02 0.097231 0.277564 0.872306 2012-06-02
2012-06-03 0.821461 0.499485 0.126441 2012-06-03
2012-06-04 0.887782 0.389486 0.374118 2012-06-04
2012-06-05 0.248065 0.032287 0.850939 2012-06-05
2012-06-06 0.101917 0.121171 0.577643 2012-06-06
2012-06-07 0.225278 0.161301 0.708996 2012-06-07
2012-06-08 0.906042 0.828814 0.247564 2012-06-08
2012-06-09 0.733363 0.924076 0.393353 2012-06-09
2012-06-10 0.273837 0.318013 0.754807 2012-06-10
在此之后,您可以简单地更改索引的名称,以便不打印任何内容:
After this, you could simply change the name of the index so that nothing prints:
>>> dfrm.reindex(pandas.Series(dfrm.index.values, name=''))
A B C Date
2012-06-01 0.283724 0.863012 0.798891 2012-06-01
2012-06-02 0.097231 0.277564 0.872306 2012-06-02
2012-06-03 0.821461 0.499485 0.126441 2012-06-03
2012-06-04 0.887782 0.389486 0.374118 2012-06-04
2012-06-05 0.248065 0.032287 0.850939 2012-06-05
2012-06-06 0.101917 0.121171 0.577643 2012-06-06
2012-06-07 0.225278 0.161301 0.708996 2012-06-07
2012-06-08 0.906042 0.828814 0.247564 2012-06-08
2012-06-09 0.733363 0.924076 0.393353 2012-06-09
2012-06-10 0.273837 0.318013 0.754807 2012-06-10
这似乎有些矫kill过正.另一种选择是在将日期"添加为列之后,将索引更改为整数或其他形式:
This seems a bit overkill. Another option is to just change the index to integers or something after adding the Date as a column:
>>> dfrm.reset_index()
或者如果您已经将索引手动移动到列中,那么
or if you already moved the index into a column manually, then just
>>> dfrm.index = range(len(dfrm))
>>> dfrm
A B C Date
0 0.283724 0.863012 0.798891 2012-06-01
1 0.097231 0.277564 0.872306 2012-06-02
2 0.821461 0.499485 0.126441 2012-06-03
3 0.887782 0.389486 0.374118 2012-06-04
4 0.248065 0.032287 0.850939 2012-06-05
5 0.101917 0.121171 0.577643 2012-06-06
6 0.225278 0.161301 0.708996 2012-06-07
7 0.906042 0.828814 0.247564 2012-06-08
8 0.733363 0.924076 0.393353 2012-06-09
9 0.273837 0.318013 0.754807 2012-06-10
或者,如果您关心列的显示顺序,则执行以下操作:
Or the following if you care about the order the columns appear:
>>> dfrm.ix[:,[-1]+range(len(dfrm.columns)-1)]
Date A B C
0 2012-06-01 0.283724 0.863012 0.798891
1 2012-06-02 0.097231 0.277564 0.872306
2 2012-06-03 0.821461 0.499485 0.126441
3 2012-06-04 0.887782 0.389486 0.374118
4 2012-06-05 0.248065 0.032287 0.850939
5 2012-06-06 0.101917 0.121171 0.577643
6 2012-06-07 0.225278 0.161301 0.708996
7 2012-06-08 0.906042 0.828814 0.247564
8 2012-06-09 0.733363 0.924076 0.393353
9 2012-06-10 0.273837 0.318013 0.754807
已添加
以下是一些有用的函数,它们可以包含在iPython配置脚本中(以便在启动时加载它们),或者放在可以在Python中工作时轻松加载的模块中.
Here are a few helpful functions to include in an iPython configuration script (so that they are loaded upon startup), or to put in a module you can easily load when working in Python.
###########
# Imports #
###########
import pandas
import datetime
import numpy as np
from dateutil import relativedelta
from pandas.io import data as pdata
############################################
# Functions to retrieve Yahoo finance data #
############################################
# Utility to get generic stock symbol data from Yahoo finance.
# Starts two days prior to present (or most recent business day)
# and goes back a specified number of days.
def getStockSymbolData(sym_list, end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
dReader = pdata.DataReader
start_date = end_date + relativedelta.relativedelta(days=-num_dates)
return dict( (sym, dReader(sym, "yahoo", start=start_date, end=end_date)) for sym in sym_list )
###
# Utility function to get some AAPL data when needed
# for testing.
def getAAPL(end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
dReader = pdata.DataReader
return getStockSymbolData(['AAPL'], end_date=end_date, num_dates=num_dates)
###
我还在下面开设了一个班级,以保存一些普通股数据:
I also made a class below to hold some data for common stocks:
#####
# Define a 'Stock' class that can hold simple info
# about a security, like SEDOL and CUSIP info. This
# is mainly for debugging things and quickly getting
# info for a single security.
class MyStock():
def __init__(self, ticker='None', sedol='None', country='None'):
self.ticker = ticker
self.sedol=sedol
self.country = country
###
def getData(self, end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
return pandas.DataFrame(getStockSymbolData([self.ticker], end_date=end_date, num_dates=num_dates)[self.ticker])
###
#####
# Make some default stock objects for common stocks.
AAPL = MyStock(ticker='AAPL', sedol='03783310', country='US')
SAP = MyStock(ticker='SAP', sedol='484628', country='DE')
这篇关于如何在数据框中取出列索引名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!