loc在使用DataFrame自己的索引的DataFrame上失败? [英] loc fails on a DataFrame using the DataFrame's own index?
问题描述
我有一个带有DateTime索引的DataFrame,其中有很多重复的索引标签(即具有相同日期时间的行).我想查看具有相同日期时间的行.所以我有以下
utimes = pd.unique(data.index.tolist())
for time in utimes:
data_now = data.loc[time]
# Do some processing on the data_now
此操作失败,并显示示例错误:KeyError'标签[2015-02-05 21:54:00 + 00:00]不在[index]中>
仅检查这是否不是创建utimes的问题,否则将失败
data.loc[data.index[0]]
,带有相同的错误消息.怎么会这样?这是索引的样子
> data.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 21:54:00+00:00, ..., 2015-02-05 23:24:00+00:00]
Length: 457, Freq: None, Timezone: UTC
和
> data.index[0]
Timestamp('2015-02-05 22:24:00+0000', tz='UTC')
为什么我不能将.loc与data_frame自己的索引一起使用?
pd.unique
似乎不尊重datetime64
dtype:
In [11]: df.index
Out[11]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00+00:00]
Length: 1, Freq: None, Timezone: UTC
In [12]: pd.unique(df.index)
Out[12]: array([1423175040000000000L], dtype=object)
现在(直到此错误已在熊猫中修复),您可以将其包装在to_datetime
调用中:
In [13]: pd.to_datetime(pd.unique(df.index))
Out[13]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00]
Length: 1, Freq: None, Timezone: None
或更清晰地说,您可以使用唯一的方法DatetimeIndex:
In [14]: df.index.unique()
Out[14]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00+00:00]
Length: 1, Freq: None, Timezone: UTC
I have a DataFrame with a DateTime index where there are many duplicate index labels (i.e. rows with the same datetime). I want to look at rows with the same datetime. So I have the following
utimes = pd.unique(data.index.tolist())
for time in utimes:
data_now = data.loc[time]
# Do some processing on the data_now
This fails with an example error: KeyError 'the label [2015-02-05 21:54:00+00:00] is not in the [index]'
Just to check that this isn't an issue in the creation of utimes, this fails
data.loc[data.index[0]]
with the same error message. How can this be? Here's what the index looks like
> data.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 21:54:00+00:00, ..., 2015-02-05 23:24:00+00:00]
Length: 457, Freq: None, Timezone: UTC
and
> data.index[0]
Timestamp('2015-02-05 22:24:00+0000', tz='UTC')
Any ideas why I can't use .loc with a data_frame's own index??
It looks like pd.unique
does not respect the datetime64
dtype:
In [11]: df.index
Out[11]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00+00:00]
Length: 1, Freq: None, Timezone: UTC
In [12]: pd.unique(df.index)
Out[12]: array([1423175040000000000L], dtype=object)
For now (until this bug is fixed in pandas) you can wrap this in a to_datetime
call:
In [13]: pd.to_datetime(pd.unique(df.index))
Out[13]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00]
Length: 1, Freq: None, Timezone: None
or, more cleanly, you can use the unique method DatetimeIndex:
In [14]: df.index.unique()
Out[14]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00+00:00]
Length: 1, Freq: None, Timezone: UTC
这篇关于loc在使用DataFrame自己的索引的DataFrame上失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!