pandas 左联接中不匹配的左表记录 [英] unmatched left table records in a left join in pandas

查看:71
本文介绍了 pandas 左联接中不匹配的左表记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,学生数据框和费用数据框。 费用 DataFrame中缺少某些学生的费用明细。我想返回所有缺少费用详细信息的学生的详细信息。 班级,部分和 RollNo这三个字段构成唯一的组合。

I have two DataFrames, 'Students' DataFrame and 'Fee' DataFrame. The fee details of some of the students are missing in 'Fee' DataFrame. I would like to return the details of all students whose fee details are missing. The three fields 'Class', 'Section' and 'RollNo' form a unique combination.

Students = pd.DataFrame({
    'Class': [7, 7, 8],
    'Section': ['A', 'B', 'B'],
    'RollNo': [2, 3, 4],
    'Student': ['Ram', 'Rahim', 'Robert']
})

Fee = pd.DataFrame({
    'Class': [7, 7, 8],
    'Section': ['A', 'B', 'B'],
    'RollNo': [2, 2, 3],
    'Fee': [10, 20, 30]
})







Students

   Class  RollNo Section Student
0      7       2       A     Ram
1      7       3       B   Rahim
2      8       4       B  Robert







Fee

   Class  Fee  RollNo Section
0      7   10       2       A
1      7   20       2       B
2      8   30       3       B

基本上,我想在左侧表格中找到不匹配的记录根据上述3个字段在学生和费用数据框架之间进行左联接。在Python中使用 Pandas 来实现此目标的最简单方法是什么?

Essentially, I would like to find the unmatched records from the left table when I do a left join between 'Students' and 'Fee' DataFrames based on 3 fields mentioned above. What is the simplest way to achieve this using Pandas in Python?

非常感谢!

推荐答案

如果 Fee 列中的 NaN 没有, code>费用 DataFrame使用 合并 布尔索引 isna

If no NaNs in Fee column in Fee DataFrame use merge anf filter by boolean indexing with isna:

df = pd.merge(Students, Fee, how='left')
print (df)
   Class  RollNo Section Student   Fee
0      7       2       A     Ram  10.0
1      7       3       B   Rahim   NaN
2      8       4       B  Robert   NaN

df1 = df[df['Fee'].isna()].drop('Fee', axis=1)
#for oldier versions of pandas
#df1 = df[df['Fee'].isnull()].drop('Fee', axis=1)
print (df1)
   Class  RollNo Section Student
1      7       3       B   Rahim
2      8       4       B  Robert

使用 NaN 的更通用的解决方案,也将参数 indicator 添加到 merge 并使用 left_only 过滤行:

More general solution working with NaNs too add parameter indicator to merge and filter rows with left_only:

Fee = pd.DataFrame({'Class':[7,7,8],
                    'Section':['A','B','B'],
                    'RollNo':[2,2,3],
                    'Fee':[np.nan,20,30]})
print (Fee)
   Class   Fee  RollNo Section
0      7   NaN       2       A
1      7  20.0       2       B
2      8  30.0       3       B

df = pd.merge(Students, Fee, how='left', indicator=True)
print (df)
   Class  RollNo Section Student  Fee     _merge
0      7       2       A     Ram  NaN       both
1      7       3       B   Rahim  NaN  left_only
2      8       4       B  Robert  NaN  left_only

df1 = df[df['_merge'].eq('left_only')].drop(['Fee','_merge'], axis=1)
print (df1)
   Class  RollNo Section Student
1      7       3       B   Rahim
2      8       4       B  Robert

这篇关于 pandas 左联接中不匹配的左表记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆