如何摆脱 pandas 的多维索引 [英] How to get away with a multidimensional index in pandas

查看:39
本文介绍了如何摆脱 pandas 的多维索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Pandas中,在多索引中选择任意行集的好方法是什么?

In Pandas, what is a good way to select sets of arbitrary rows in a multiindex?

df = pd.DataFrame(columns=['A', 'B', 'C'])
df['A'] = ['a', 'a', 'b', 'b']
df['B'] = [1,2,3,4]
df['C'] = [1,2,3,4]

the_indices_we_want = df.ix[[0,3],['A','B']]
df = df.set_index(['A', 'B']) #Create a multiindex

df.ix[the_indices_we_want] #ValueError: Cannot index with multidimensional key

df.ix[[tuple(x) for x in the_indices_we_want.values]]

最后一行是一个答案,但感觉很笨拙;它们甚至不能是列表,它们必须是元组.它还涉及生成一个新对象以进行索引.我正在尝试对多索引数据框进行查找,并使用另一个数据框的索引进行查找:

This last line is an answer, but it feels clunky answer; they can't even be lists, they have to be tuples. It also involves generating a new object to do the indexing with. I'm in a situation where I'm trying to do a lookup on a multiindex dataframe, with indices from another dataframe:

data_we_want = dataframe_with_the_data.ix[dataframe_with_the_indices[['Index1','Index2']]]

现在看来我需要这样写:

Right now it looks like I need to write it like this:

data_we_want = dataframe_with_the_data.ix[[tuple(x) for x in dataframe_with_the_indices[['Index1','Index2']].values]]

这是可行的,但是如果有很多行(即数亿个所需的索引),那么生成此元组列表就变得相当麻烦.有解决方案吗?

That is workable, but if there are many rows (i.e. hundreds of millions of desired indices) then generating this list of tuples becomes quite the burden. Any solutions?

@joris提供的解决方案有效,但如果索引均为数字,则无效.索引均为整数的示例:

The solution by @joris works, but not if the indices are all numbers. Example where the indices are all integers:

df = pd.DataFrame(columns=['A', 'B', 'C'])
df['A'] = ['a', 'a', 'b', 'b']
df['B'] = [1,2,3,4]
df['C'] = [1,2,3,4]

the_indices_we_want = df.ix[[0,3],['B','C']]
df = df.set_index(['B', 'C'])

df.ix[pd.Index(the_indices_we_want)] #ValueError: Cannot index with multidimensional key

df.ix[pd.Index(the_indices_we_want.astype('object'))] #Works, though feels clunky.

推荐答案

您确实无法直接使用DataFrame编制索引,但是如果将其转换为Index对象,它将做正确的事情(该DataFrame中的行将被视为作为一个多索引条目):

You indeed cannot index with a DataFrame directly, but if you convert it to an Index object, it does the correct thing (a row in the DataFrame will be regarded as one multi-index entry):

In [43]: pd.Index(the_indices_we_want)
Out[43]: Index([(u'a', 1), (u'b', 4)], dtype='object')

In [44]: df.ix[pd.Index(the_indices_we_want)]
Out[44]:
     C
A B
a 1  1
b 4  4

In [45]: df.ix[[tuple(x) for x in the_indices_we_want.values]]
Out[45]:
     C
A B
a 1  1
b 4  4

这比较干净.并通过一些快速测试似乎更快一些(但不是很多,只有2倍)

This is a somewhat cleaner. And with some quick tests it seems to be a bit faster (but not much, only 2 times)

这篇关于如何摆脱 pandas 的多维索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆