python panda:返回常见行的索引 [英] python panda: return indexes of common rows

查看:257
本文介绍了python panda:返回常见行的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

道歉,如果这是一个相当新手的问题.我试图找到两个数据帧之间共有哪些行.返回值应该是与df1相同的df2的行索引.我笨拙的例子:

Apologies, if this is a fairly newbie question. I was trying to find which rows are common between two data frames. The return values should be the row indexes of df2 that are common with df1. My clunky example:

df1 = pd.DataFrame({'col1':['cx','cx','cx2'], 'col2':[1,4,12]})
df1['col2'] = df1['col2'].map(str);
df2 = pd.DataFrame({'col1':['cx','cx','cx','cx','cx2','cx2'], 'col2':[1,3,5,10,12,12]})
df2['col2'] = df2['col2'].map(str);

df1['idx'] = df1[['col1','col2']].apply(lambda x: '_'.join(x),axis=1);
df2['idx'] = df2[['col1','col2']].apply(lambda x: '_'.join(x),axis=1);

df1['idx_values'] = df1.index.values
df2['idx_values'] = df2.index.values

df3 = pd.merge(df1,df2,on = 'idx');
myindexes = df3['idx_values_y'];

myindexes.to_csv(idir + 'test.txt',sep='\t',index = False);

返回值应为[0,4,5].高效地完成此操作将非常棒,因为两个数据帧将具有几百万行.

The return values should be [0,4,5]. It would be great to have this done efficiently, since the two dataframes would have several million rows.

谢谢!

推荐答案

不需要具有连接值的新列,默认情况下,两列内部合并,如果需要的值df2.index,则添加

New column with join values is not necessary, merge by default inner merge by both columns and if need values of df2.index add reset_index:

df1 = pd.DataFrame({'col1':['cx','cx','cx2'], 'col2':[1,4,12]})
df2 = pd.DataFrame({'col1':['cx','cx','cx','cx','cx2','cx2'], 'col2':[1,3,5,10,12,12]})

df3 = pd.merge(df1,df2.reset_index(), on = ['col1','col2'])
print (df3)
  col1 col2  index
0   cx    1      0
1  cx2   12      4
2  cx2   12      5

两个索引都需要:

df4 = pd.merge(df1.reset_index(),df2.reset_index(), on = ['col1','col2'])
print (df4)

   index_x col1  col2  index_y
0        0   cx     1        0
1        2  cx2    12        4
2        2  cx2    12        5

仅适用于两个DataFrame的交集:

For only intersection of both DataFrames:

df5 = pd.merge(df1,df2, on = ['col1','col2'])
#if 2 column DataFrame   
#df5 = pd.merge(df1,df2)
print (df5)

  col1  col2
0   cx     1
1  cx2    12
2  cx2    12

这篇关于python panda:返回常见行的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆