pandas :修改特定级别的Multiindex [英] Pandas: Modify a particular level of Multiindex

查看:49
本文介绍了 pandas :修改特定级别的Multiindex的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有Multiindex的数据框,并且想修改Multiindex的一个特定级别.例如,第一级可能是字符串,我可能想从该索引级中删除空格:

I have a dataframe with Multiindex and would like to modify one particular level of the Multiindex. For instance, the first level might be strings and I may want to remove the white spaces from that index level:

df.index.levels[1] = [x.replace(' ', '') for x in df.index.levels[1]]

但是,上面的代码导致错误:

However, the code above results in an error:

TypeError: 'FrozenList' does not support mutable operations.

我知道我可以reset_index并修改列,然后重新创建Multiindex,但是我想知道是否存在一种更优雅的方法来直接修改Multiindex的特定级别.

I know I can reset_index and modify the column and then re-create the Multiindex, but I wonder whether there is a more elegant way to modify one particular level of the Multiindex directly.

推荐答案

感谢@cxrodgers的评论,我认为最快的方法是:

Thanks to @cxrodgers's comment, I think the fastest way to do this is:

df.index = df.index.set_levels(df.index.levels[0].str.replace(' ', ''), level=0)


较长的答案:


Old, longer answer:

我发现@Shovalt建议的列表理解有效,但在我的机器上感觉很慢(使用具有10,000行以上的数据框).

I found that the list comprehension suggested by @Shovalt works but felt slow on my machine (using a dataframe with >10,000 rows).

相反,我可以使用.set_levels方法,这对我来说要快很多.

Instead, I was able to use .set_levels method, which was quite a bit faster for me.

%timeit pd.MultiIndex.from_tuples([(x[0].replace(' ',''), x[1]) for x in df.index])
1 loop, best of 3: 394 ms per loop

%timeit df.index.set_levels(df.index.get_level_values(0).str.replace(' ',''), level=0)
10 loops, best of 3: 134 ms per loop

实际上,我只需要添加一些文字即可.使用.set_levels甚至更快:

In actuality, I just needed to prepend some text. This was even faster with .set_levels:

%timeit pd.MultiIndex.from_tuples([('00'+x[0], x[1]) for x in df.index])
100 loops, best of 3: 5.18 ms per loop

%timeit df.index.set_levels('00'+df.index.get_level_values(0), level=0)
1000 loops, best of 3: 1.38 ms per loop

%timeit df.index.set_levels('00'+df.index.levels[0], level=0)
1000 loops, best of 3: 331 µs per loop

此解决方案基于@denfromufa的评论链接中的答案...

This solution is based on the answer in the link from the comment by @denfromufa ...

python-多索引和时区-冻结列表错误-堆栈溢出

这篇关于 pandas :修改特定级别的Multiindex的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆