使用Pandas查找2个不同大小的数据框之间的不同行 [英] Find different rows between 2 dataframes of different size with Pandas
本文介绍了使用Pandas查找2个不同大小的数据框之间的不同行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有2个大小不同的数据帧df1和df2.
I have 2 dataframes df1 and df2 of different size.
df1 = pd.DataFrame({'A':[np.nan, np.nan, np.nan, 'AAA','SSS','DDD'], 'B':[np.nan,np.nan,'ciao',np.nan,np.nan,np.nan]})
df2 = pd.DataFrame({'C':[np.nan, np.nan, np.nan, 'SSS','FFF','KKK','AAA'], 'D':[np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan]})
我的目标是确定df1中哪些元素不会出现在df2中.
My goal is to identify the elements of df1 which do not appear in df2.
使用以下几行代码,我能够实现自己的目标.
I was able to achieve my goal using the following lines of code.
df = pd.DataFrame({})
for i, row1 in df1.iterrows():
found = False
for j, row2, in df2.iterrows():
if row1['A']==row2['C']:
found = True
print(row1.to_frame().T)
if found==False and pd.isnull(row1['A'])==False:
df = pd.concat([df, row1.to_frame().T], axis=0)
df.reset_index(drop=True)
有没有更优雅,更有效的方法来实现我的目标?
Is there a more elegant and efficient way to achieve my goal?
注意:解决方法是
A B
0 DDD NaN
推荐答案
我认为需要 isin
与 boolean indexing
:
I believe need isin
withboolean indexing
:
在默认情况下,也省略NaN
的行以链接新条件:
Also omit NaN
s rows by default chain new condition:
#changed df2 with no NaN in C column
df2 = pd.DataFrame({'C':[4, 5, 5, 'SSS','FFF','KKK','AAA'],
'D':[np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan]})
print (df2)
C D
0 4 NaN
1 5 NaN
2 5 NaN
3 SSS 1.0
4 FFF NaN
5 KKK NaN
6 AAA NaN
df = df1[~(df1['A'].isin(df2['C']) | (df1['A'].isnull()))]
print (df)
A B
5 DDD NaN
如果没有必要,请省略C
列中的NaN
:
If not necessary omit NaN
s if not exist in C
column:
df = df1[~df1['A'].isin(df2['C'])]
print (df)
A B
0 NaN NaN
1 NaN NaN
2 NaN ciao
5 DDD NaN
如果两列中均存在NaN
,则使用第二种解决方案:
If exist NaN
s in both columns use second solution:
(输入DataFrame
来自问题)
df = df1[~df1['A'].isin(df2['C'])]
print (df)
A B
5 DDD NaN
这篇关于使用Pandas查找2个不同大小的数据框之间的不同行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文