用重复的索引值重新索引数据框 [英] Reindex a dataframe with duplicate index values

查看:91
本文介绍了用重复的索引值重新索引数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我将4个csv导入并合并到一个称为data的数据帧中.但是,通过以下方式检查数据框的索引:

So I imported and merged 4 csv's into one dataframe called data. However, upon inspecting the dataframe's index with:

index_series = pd.Series(data.index.values)
index_series.value_counts()

我看到多个索引条目有4个计数.我想完全重新索引数据数据帧,以便现在每行都有一个唯一的索引值.我试过了:

I see that multiple index entries have 4 counts. I want to completely reindex the data dataframe so each row now has a unique index value. I tried:

data.reindex(np.arange(len(data)))

出现错误"ValueError:无法从重复的轴重新编制索引".谷歌搜索使我认为此错误是因为最多有4行共享相同的索引值.知道如何在不删除任何行的情况下进行此重新索引编制吗?我也不特别在乎行的顺序,因为我总是可以对行进行排序.

which gave the error "ValueError: cannot reindex from a duplicate axis." A google search leads me to think this error is because the there are up to 4 rows that share a same index value. Any idea how I can do this reindexing without dropping any rows? I don't particularly care about the order of the rows either as I can always sort it.

更新: 因此,最终,我确实找到了想要的方式来重新编制索引.

UPDATE: So in the end I did find a way to reindex like I wanted.

data['index'] = np.arange(len(data))
data = data.set_index('index')

据我了解,我只是在数据框中添加了一个名为索引"的新列,然后将该列设置为索引. 至于我的csv,它们是此页面的借贷数据"下的四个csv.俱乐部贷款统计.

As I understand it, I just added a new column called 'index' to my data frame, and then set that column as my index. As for my csv's, they were the four csv's under "download loan data" on this page of Lending Club loan stats.

推荐答案

使用以下示例数据很容易复制您的错误:

It's pretty easy to replicate your error with this sample data:

In [92]: data = pd.DataFrame( [33,55,88,22], columns=['x'], index=[0,0,1,2] )

In [93]: data.index.is_unique
Out[93]: False

In [94:] data.reindex(np.arange(len(data)))  # same error message

问题是因为reindex需要唯一的索引值.在这种情况下,您不想保留旧的索引值,而只想要唯一的新索引值.最简单的方法是:

The problem is because reindex requires unique index values. In this case, you don't want to preserve the old index values, you merely want new index values that are unique. The easiest way to do that is:

In [95]: data.reset_index(drop=True)
Out[72]: 
    x
0  33
1  55
2  88
3  22

请注意,如果要保留旧的索引值,可以省略drop=True.

Note that you can leave off drop=True if you want to retain the old index values.

这篇关于用重复的索引值重新索引数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆