重命名多索引数据框中的索引值 [英] Renaming index values in multiindex dataframe
问题描述
创建数据框:
from pandas import *
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = zip(*arrays)
index = MultiIndex.from_tuples(tuples, names=['first','second'])
data = DataFrame(randn(8,2),index=index,columns=['c1','c2'])
data
Out[68]:
c1 c2
first second
bar one 0.833816 -1.529639
two 0.340150 -1.818052
baz one -1.605051 -0.917619
two -0.021386 -0.222951
foo one 0.143949 -0.406376
two 1.208358 -2.469746
qux one -0.345265 -0.505282
two 0.158928 1.088826
我想重命名第一个"索引值,例如"bar"->"cat","baz"->"dog"等.但是,我读过的每个示例都对单个-级别索引和/或遍历整个索引以有效地从头开始重新创建它.我在想类似的东西:
I would like to rename the "first" index values, such as "bar"->"cat", "baz"->"dog", etc. However, every example I have read either operates on a single-level index and/or loops through the entire index to effectively re-create it from scratch. I was thinking something like:
data = data.reindex(index={'bar':'cat','baz':'dog'})
但这是行不通的,我也不希望它能在多个索引上运行.我可以在不循环浏览整个数据框索引的情况下进行这种替换吗?
but this does not work, nor do I really expect it to work on multiple indexes. Can I do such a replacement without looping through the entire dataframe index?
开始编辑
Begin edit
在发布之前,我一直不愿意将其更新为0.13,因此我使用了以下解决方法:
I am hesitant to update to 0.13 until release, so I used the following workaround:
index = data.index.tolist()
for r in xrange( len(index) ):
index[r] = (codes[index[r][0]],index[r][1])
index = pd.MultiIndex.from_tuples(index,names=data.index.names)
data.index = index
这是先前定义的code:string对的字典.实际上,这并不像我期望的那样好(要花几秒钟才能运行约110万行).它不像单线纸那样漂亮,但确实可以工作.
Where is a previous defined dictionary of code:string pairs. This actually isn't as big of a performance his as I was expecting (takes a couple seconds to operate over ~1.1 million rows). It is not as pretty as a one-liner, but it does work.
结束编辑
End Edit
推荐答案
使用set_levels
方法(收益
c1 c2
first second
cat one -0.289649 -0.870716
two -0.062014 -0.410274
dog one 0.030171 -1.091150
two 0.505408 1.531108
foo one 1.375653 -1.377876
two -1.478615 1.351428
qux one 1.075802 0.532416
two 0.865931 -0.765292
要基于字典重新映射级别,可以使用以下函数:
To remap a level based on a dict, you could use a function such as this:
def map_level(df, dct, level=0):
index = df.index
index.set_levels([[dct.get(item, item) for item in names] if i==level else names
for i, names in enumerate(index.levels)], inplace=True)
dct = {'bar':'cat', 'baz':'dog'}
map_level(data, dct, level=0)
这是一个可运行的示例:
Here's a runnable example:
import numpy as np
import pandas as pd
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = zip(*arrays)
index = pd.MultiIndex.from_tuples(tuples, names=['first','second'])
data = pd.DataFrame(np.random.randn(8,2),index=index,columns=['c1','c2'])
data2 = data.copy()
data.index.set_levels([[u'cat', u'dog', u'foo', u'qux'],
[u'one', u'two']], inplace=True)
print(data)
# c1 c2
# first second
# cat one 0.939040 -0.748100
# two -0.497006 -1.185966
# dog one -0.368161 0.050339
# two -2.356879 -0.291206
# foo one -0.556261 0.474297
# two 0.647973 0.755983
# qux one -0.017722 1.364244
# two 1.007303 0.004337
def map_level(df, dct, level=0):
index = df.index
index.set_levels([[dct.get(item, item) for item in names] if i==level else names
for i, names in enumerate(index.levels)], inplace=True)
dct = {'bar':'wolf', 'baz':'rabbit'}
map_level(data2, dct, level=0)
print(data2)
# c1 c2
# first second
# wolf one 0.939040 -0.748100
# two -0.497006 -1.185966
# rabbit one -0.368161 0.050339
# two -2.356879 -0.291206
# foo one -0.556261 0.474297
# two 0.647973 0.755983
# qux one -0.017722 1.364244
# two 1.007303 0.004337
这篇关于重命名多索引数据框中的索引值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!