Pandas DataFrame以字符串元组作为索引 [英] Pandas DataFrame with tuple of strings as index
问题描述
我在这里感觉到一些奇怪的 pandas
行为。我的数据框看起来像
I'm sensing some weird pandas
behavior here. I have a dataframe that looks like
df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
index=[('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')])
In [14]: df
Out[14]:
Col 1 Col 2 Col 3
(1, a) NaN NaN NaN
(2, a) NaN NaN NaN
(1, b) NaN NaN NaN
(2, b) NaN NaN NaN
我可以设置任意元素的值
I can set the value of an arbitrary element
In [15]: df['Col 2'].loc[('1', 'b')] = 6
In [16]: df
Out[16]:
Col 1 Col 2 Col 3
(1, a) NaN NaN NaN
(2, a) NaN NaN NaN
(1, b) NaN 6 NaN
(2, b) NaN NaN NaN
但是当我使用相同的语法引用我刚设置的元素时,我得
But when I go to reference the element that I just set using the same syntax, I get
In [17]: df['Col 2'].loc[('1', 'b')]
KeyError: 'the label [1] is not in the [index]'
有人可以告诉我我做错了什么或为什么这样做发生?我根本不允许将索引设置为多元素元组吗?
Can someone tell me what I'm doing wrong or why this behavior occurs? Am I simply not allowed to set the index as a multi-element tuple?
编辑
显然,将元组索引包装在列表中是有效的。
Apparently, wrapping the tuple index in a list works.
In [38]: df['Col 2'].loc[[('1', 'b')]]
Out[38]:
(1, b) 6
Name: Col 2, dtype: object
虽然我在实际使用案例中仍然有一些奇怪的行为所以很高兴知道这是不是很好不建议使用。
Although I'm still getting some weird behavior in my actual use case so it'd be nice to know if this is not recommended usage.
推荐答案
选择括号中的元组被视为包含要检索的元素的序列。这就像你将 ['1','b']
作为参数传递一样。因此KeyError消息:pandas试图找到键'1'
并且显然找不到它。
Your tuple in the selection brackets is seen as a sequence containing the elements you want to retrieve. It's like you would have passed ['1', 'b']
as argument. Thus the KeyError message: pandas tries to find the key '1'
and obviously doesn't find it.
这就是为什么它在你添加额外括号时起作用,因为现在参数变成了一个元素的序列 - 你的元组。
That's why it works when you add additional brackets, as now the argument becomes a sequence of one element - your tuple.
你应该避免处理列表和元组参数的歧义在选择。行为也可能有所不同,具体取决于索引是简单索引还是多索引。
You should avoid dealing with ambiguities around list and tuple arguments in selection. The behavior can be also different depending on the index being a simple index or a multiindex.
无论如何,如果你在这里询问建议,我看到的是那个你应该尝试不构建由元组组成的简单索引:如果你实际构建一个多索引,pandas会更好用,并且会更强大:
In any case, if you ask about recommendations here, the one I see is that you should try to not build simple indexes made of tuples: pandas will work better and will be more powerful to use if you actually build a multiindex instead:
df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
index=pd.MultiIndex.from_tuples([('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')]))
df['Col 2'].loc[('1', 'b')] = 6
df['Col 2'].loc[('1', 'b')]
Out[13]: 6
df
Out[14]:
Col 1 Col 2 Col 3
1 a NaN NaN NaN
2 a NaN NaN NaN
1 b NaN 6 NaN
2 b NaN NaN NaN
这篇关于Pandas DataFrame以字符串元组作为索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!