串联后 pandas 重新计算索引 [英] Pandas recalculate index after a concatenation

查看:51
本文介绍了串联后 pandas 重新计算索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我通过沿行轴连接(垂直堆叠)来生成熊猫数据框.

I have a problem where I produce a pandas dataframe by concatenating along the row axis (stacking vertically).

每个组成数据帧都有一个自动生成的索引(升序编号).

Each of the constituent dataframes has an autogenerated index (ascending numbers).

连接后,我的索引被搞砸了:它的计数最多为n(其中n是相应数据帧的shape [0]),并在下一个数据帧以零重新开始.

After concatenation, my index is screwed up: it counts up to n (where n is the shape[0] of the corresponding dataframe), and restarts at zero at the next dataframe.

我正在尝试根据给定的当前顺序重新计算索引"或重新索引"(或者我认为).事实证明,这并不是DataFrame.reindex似乎正在做的事情.

I am trying to "re-calculate the index, given the current order", or "re-index" (or so I thought). Turns out that isn't exactly what DataFrame.reindex seems to be doing.

这是我尝试做的事情:

train_df = pd.concat(train_class_df_list)
train_df = train_df.reindex(index=[i for i in range(train_df.shape[0])])

由于无法从重复的轴重新索引"而失败.我不想更改数据的顺序...只需要删除旧索引并设置一个新的索引,并保留行的顺序即可.

It failed with "cannot reindex from a duplicate axis." I don't want to change the order of my data... just need to delete the old index and set up a new one, with the order of rows preserved.

推荐答案

垂直连接后,如果得到的索引是 [0,n),然后是 [0,m),您只需调用 :

After vertical concatenation, if you get an index of [0, n) followed by [0, m), all you need to do is call reset_index:

train_df.reset_index(drop=True)

(您可以使用inplace=True就地执行此操作.)

(you can do this in place using inplace=True).

import pandas as pd

>>> pd.concat([
    pd.DataFrame({'a': [1, 2]}), 
    pd.DataFrame({'a': [1, 2]})]).reset_index(drop=True)
    a
0   1
1   2
2   1
3   2

这篇关于串联后 pandas 重新计算索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆