重建索引数据帧的问题：重新索引仅对唯一有价值的索引对象有效 [英] problems with reindexing dataframes: Reindexing only valid with uniquely valued Index objects

查看：353 发布时间：2017/3/26 3:00:06 dataframe pandas reindex

本文介绍了重建索引数据帧的问题：重新索引仅对唯一有价值的索引对象有效的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试在大熊猫中重新建立一个数据框时，我有一个非常奇怪的行为。我的Pandas版本是0.10.0，我使用的是Python 2.7。
基本上，当我加载数据框时：

  eurusd = pd.DataFrame.load（'EUR_USD_30Min.df'） .drop_duplicates（）。dropna（）
 
 eurusd 
 
< class'pandas.core.frame.DataFrame'> 
 DatetimeIndex：119710条目，2003-02-02 17:30:00至2012-12-28 17:00:00 
数据列：
打开119710非空值
高119710非空值
 low 119710非空值
关闭119710非空值
 dtypes：float64（4）

然后我尝试在更大的日期范围内重建索引：

  newindex = pd.DateRange（datetime.datetime（2002,1,1），datetime.datetime（2012,12,31），offset = pd.datetools.Minute（30））
 
 newindex 
 
< class'pandas.tseries.index.DatetimeIndex'> 
 [2002-01-01 00:00:00，...，2012-12-31 00:00:00] 
长度：192817，频率：30T，时区：无

当尝试重新索引数据框时，我会感到奇怪的行为。如果我重新索引数据集的大部分，我会收到以下错误：

  eurusd [29558：29560] .reindex（index = newindex ）
 
异常：重新索引仅对唯一有价值的索引对象有效

但是，如果我对上述两个子集进行相同的操作，我不会收到错误：

这是第一个子集，没有问题，

  eurusd [29558：29559] .reindex（index = newindex）
 
< class'pandas.core.frame。 DataFrame'> 
 DatetimeIndex：192817条目，2002-01-01 00:00:00至2012-12-31 00:00:00 
频率：30T 
数据列：
打开1非空值
高1非空值
低1非空值
关闭1非空值
 dtypes：float64（4）

，这里是第二个子集，仍然没有问题，

  eurusd [29559：29560] .reindex（index = newindex）
 
< class'pandas.core.frame.DataFrame'> 
 DatetimeIndex：192817条目，2002-01-01 00:00:00至2012-12-31 00:00:00 
频率：30T 
数据列：
打开1非空值
高1非空值
低1非空值
关闭1非空值
 dtypes：float64（4）

我真的很疯狂，不明白这个原因。看起来数据帧是重复的干净和重复的索引....如果需要，我可以为数据框提供pickle文件。

解决方案

您可以通过索引进行分组并获取第一个条目（请参阅 docs ）：

  df.groupby（level = 0）。第一（）

示例：

 在[1]中：df = pd.DataFrame（[[1]，[2]]，index = [1，1]）$ b 
 $ b在[2] df 
输出[2]：
 0 
 1 1 
 1 2 
 
在[3]中：df.groupby（level = 0） （）
出[3]：
 0 
 1 1

I am having a real strange behaviour when trying to reindex a dataframe in pandas. My version of Pandas is 0.10.0 and I use Python 2.7. Basically, when I load a dataframe:

eurusd = pd.DataFrame.load('EUR_USD_30Min.df').drop_duplicates().dropna()

eurusd

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 119710 entries, 2003-02-02 17:30:00 to 2012-12-28 17:00:00
Data columns:
open     119710  non-null values
high     119710  non-null values
low      119710  non-null values
close    119710  non-null values
dtypes: float64(4)

and then I try to reindex inside a larger date range:

newindex  = pd.DateRange(datetime.datetime(2002,1,1), datetime.datetime(2012,12,31), offset=pd.datetools.Minute(30))

newindex

<class 'pandas.tseries.index.DatetimeIndex'>
[2002-01-01 00:00:00, ..., 2012-12-31 00:00:00]
Length: 192817, Freq: 30T, Timezone: None

I get strange behaviour when trying to reindex the dataframe. If I reindex one larger part of the dataset I get this error:

eurusd[29558:29560].reindex(index=newindex)

Exception: Reindexing only valid with uniquely valued Index objects

But, if I do the same for two subsets of the data above, I don't get the error:



Here's the first subset, with no problems,
eurusd[29558:29559].reindex(index=newindex)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 192817 entries, 2002-01-01 00:00:00 to 2012-12-31 00:00:00
Freq: 30T
Data columns:
open     1  non-null values
high     1  non-null values
low      1  non-null values
close    1  non-null values
dtypes: float64(4)
and here's the second subset, still no problems,
eurusd[29559:29560].reindex(index=newindex)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 192817 entries, 2002-01-01 00:00:00 to 2012-12-31 00:00:00
Freq: 30T
Data columns:
open     1  non-null values
high     1  non-null values
low      1  non-null values
close    1  non-null values
dtypes: float64(4)
I am really going crazy about this, and cannot understand the reasons of this. It seems like the dataframe is 'clean' from duplicates, and duplicated indexes.... I can provide the pickle file for the dataframe if you want.
 解决方案 
You could groupby the index and take the first entry (see docs):
df.groupby(level=0).first()
Example:
In [1]: df = pd.DataFrame([[1], [2]], index=[1, 1])

In [2]: df
Out[2]: 
   0
1  1
1  2

In [3]: df.groupby(level=0).first()
Out[3]: 
   0
1  1


                        
这篇关于重建索引数据帧的问题：重新索引仅对唯一有价值的索引对象有效的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

重建索引数据帧的问题：重新索引仅对唯一有价值的索引对象有效 [英] problems with reindexing dataframes: Reindexing only valid with uniquely valued Index objects

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

重建索引数据帧的问题：重新索引仅对唯一有价值的索引对象有效 [英] problems with reindexing dataframes: Reindexing only valid with uniquely valued Index objects

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭