python panda:返回常见行的索引 [英] python panda: return indexes of common rows
问题描述
道歉,如果这是一个相当新手的问题.我试图找到两个数据帧之间共有哪些行.返回值应该是与df1相同的df2的行索引.我笨拙的例子:
Apologies, if this is a fairly newbie question. I was trying to find which rows are common between two data frames. The return values should be the row indexes of df2 that are common with df1. My clunky example:
df1 = pd.DataFrame({'col1':['cx','cx','cx2'], 'col2':[1,4,12]})
df1['col2'] = df1['col2'].map(str);
df2 = pd.DataFrame({'col1':['cx','cx','cx','cx','cx2','cx2'], 'col2':[1,3,5,10,12,12]})
df2['col2'] = df2['col2'].map(str);
df1['idx'] = df1[['col1','col2']].apply(lambda x: '_'.join(x),axis=1);
df2['idx'] = df2[['col1','col2']].apply(lambda x: '_'.join(x),axis=1);
df1['idx_values'] = df1.index.values
df2['idx_values'] = df2.index.values
df3 = pd.merge(df1,df2,on = 'idx');
myindexes = df3['idx_values_y'];
myindexes.to_csv(idir + 'test.txt',sep='\t',index = False);
返回值应为[0,4,5].高效地完成此操作将非常棒,因为两个数据帧将具有几百万行.
The return values should be [0,4,5]. It would be great to have this done efficiently, since the two dataframes would have several million rows.
谢谢!
推荐答案
不需要具有连接值的新列,默认情况下,两列内部合并,如果需要的值df2.index
,则添加
New column with join values is not necessary, merge by default inner merge by both columns and if need values of df2.index
add reset_index
:
df1 = pd.DataFrame({'col1':['cx','cx','cx2'], 'col2':[1,4,12]})
df2 = pd.DataFrame({'col1':['cx','cx','cx','cx','cx2','cx2'], 'col2':[1,3,5,10,12,12]})
df3 = pd.merge(df1,df2.reset_index(), on = ['col1','col2'])
print (df3)
col1 col2 index
0 cx 1 0
1 cx2 12 4
2 cx2 12 5
两个索引都需要:
df4 = pd.merge(df1.reset_index(),df2.reset_index(), on = ['col1','col2'])
print (df4)
index_x col1 col2 index_y
0 0 cx 1 0
1 2 cx2 12 4
2 2 cx2 12 5
仅适用于两个DataFrame的交集:
For only intersection of both DataFrames:
df5 = pd.merge(df1,df2, on = ['col1','col2'])
#if 2 column DataFrame
#df5 = pd.merge(df1,df2)
print (df5)
col1 col2
0 cx 1
1 cx2 12
2 cx2 12
这篇关于python panda:返回常见行的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!