重命名重复的索引值 pandas DataFrame [英] Rename duplicated index values pandas DataFrame

查看:99
本文介绍了重命名重复的索引值 pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些重复索引值的 DataFrame:

I have a DataFrame that contains some duplicated index values:

df1 =  pd.DataFrame( np.random.randn(6,6), columns = pd.date_range('1/1/2010', periods=6), index = {"A", "B", "C", "D", "E", "F"})
df1.rename(index = {"C": "A", "B": "E"}, inplace = 1)

ipdb> df1
      2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
 A   -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
 A   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
 E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
 F    1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
 D    0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
 E   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

我只想更改重复值的名称并获得如下所示的 DataFrame:

I would like to change only the name of the duplicated values and to obtain a DataFrame like the following one:

ipdb> df1
     2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A   -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A_dp   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F    1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D    0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E_dp   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

我的方法:

(i) 用新名字创建字典

(i) Create dictionary with new names

old_names = df1[df1.index.duplicated()].index.values
new_names = df1[df1.index.duplicated()].index.values + "_dp"
dictionary = dict(zip(old_names, new_names))

(ii) 仅重命名重复的值

(ii) Rename only the duplicated values

df1.loc[df1.index.duplicated(),:].rename(index = dictionary, inplace = True)

但是这似乎不起作用.

推荐答案

您可以使用 Index.where:

You can use Index.where:

df1.index = df1.index.where(~df1.index.duplicated(), df1.index + '_dp')
print (df1)
      2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A      -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A_dp   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E      -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F       1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D       0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E_dp   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

如果需要删除重复的索引到唯一:

And if need remove of duplicated index to unique:

print (df1)
   2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A   -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F    1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D    0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

df1.index = df1.index + df1.groupby(level=0).cumcount().astype(str).replace('0','')
print (df1)
    2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A    -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A1   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E    -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
E1   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
E2   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F     1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D     0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E3   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

这篇关于重命名重复的索引值 pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆