错误“只能比较标记相同的系列对象".和sort_index [英] Error"Can only compare identically-labeled Series objects" and sort_index
问题描述
我有两个数据帧df1
df2
,它们具有相同数量的行,列和变量,并且我试图比较两个数据帧中的布尔变量choice
.然后使用if/else
操作数据.但是当我尝试比较布尔值var时似乎出了点问题.
I have two dataframes df1
df2
with the same numbers of rows and columns and variables, and I'm trying to compare the boolean variable choice
in the two dataframes. Then use if/else
to manipulate the data. But something seems wrong when I try to compare the boolean var.
这是我的数据框示例和代码:
Here are my dataframes sample and codes:
#df1
v_100 choice #boolean
7 True
0 True
7 False
2 True
#df2
v_100 choice #boolean
1 False
2 True
74 True
6 True
def lastTwoTrials_outcome():
df1 = df.iloc[5::6, :] #df1 and df2 are extracted from the same dataframe first
df2 = df.iloc[4::6, :]
if df1['choice'] != df2['choice']: # if "choice" is different in the two dataframes
df1['v_100'] = (df1['choice'] + df2['choice']) * 0.5
这是错误:
if df1['choice'] != df2['choice']:
File "path", line 818, in wrapper
raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects
我在此处发现了相同的错误,,答案建议先sort_index
,但是我真的不明白为什么吗?任何人都可以详细解释(如果这是正确的解决方案)?
I found the same error here, and an answer suggests to sort_index
first, but I don't really understand why though? Can anyone explain more in detail please (if that's the correct solution)?
谢谢!
推荐答案
我认为您需要 mask
或
I think you need reset_index
for same index values and then comapare - for create new column is better use mask
or numpy.where
:
也可以使用|
代替+
,因为使用布尔值.
Also instead +
use |
because working with booleans.
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] + df2['choice']) * 0.5)
df1['v_100'] = np.where(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5,
df1['choice'])
样品:
print (df1)
v_100 choice
5 7 True
6 0 True
7 7 False
8 2 True
print (df2)
v_100 choice
4 1 False
5 2 True
6 74 True
7 6 True
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
v_100 choice
0 7 True
1 0 True
2 7 False
3 2 True
print (df2)
v_100 choice
0 1 False
1 2 True
2 74 True
3 6 True
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5)
print (df1)
v_100 choice
0 0.5 True
1 1.0 True
2 0.5 False
3 1.0 True
这篇关于错误“只能比较标记相同的系列对象".和sort_index的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!