比较两个Excel文件与Pandas [英] Comparing two excel file with pandas

查看:86
本文介绍了比较两个Excel文件与Pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个Excel文件A和B.A是主副本,其中有员工姓名和组织名称( Name Org )的更新记录.文件B包含 Name Org 列,且记录较旧,而其他许多我们不感兴趣的列.

I have two excel file, A and B. A is Master copy where updated record of employee Name and Organization Name (Name and Org) is available. File B contains Name and Org columns with bit older record and many other columns which we are not interested in.

   Name      Org
0   abc    ddc systems
1   sdc    ddc systems
2   csc    ddd systems
3   rdc    kbf org
4   rfc    kbf org

我要对此进行两项操作:

I want to do two operation on this:

1)我想将Excel B(列 Name Org )与Excel A(列 Name Org ),并使用所有缺少的 Name 名称和相应的 Org 条目更新文件B.

1) I want to compare Excel B (column Name and Org) with Excel A (column Name and Org) and update file B with all the missing entries of Name and corresponding Org.

2)对于文件B中的所有现有条目(列 Name Org ),我想比较文件和文件A并更新 Org 列(如果有任何员工组织已更改).

2) For all existing entries in File B (column Name and Org), I would like to compare file and with file A and update Org column if any employee organization has changed.

对于解决方案1),找到以下方法尝试过的新条目(虽然不确定这种方法是否正确),但输出是元组,我不确定如何更新回 DataFrame .

For Solution 1) to find the new entries tried below approach (Not sure if this approach is correct though), output is tuple which I was not sure how to update back to DataFrame.

diff = set(zip(new_df.Name, new_df.Org)) - set(zip(old_df.Name, old_df.Org))

任何帮助将不胜感激.谢谢.

Any help will be appreciated. Thanks.

推荐答案

如果名称是唯一的,只需串联A和B,然后删除重复的名称.假设 A B 是您的数据框,

If names are unique, just concatenate A and B, and drop duplicates. Assuming A and B are your DataFrames,

df = pd.concat([A, B]).drop_duplicates(subset=['Name'], keep='first')

或者,

A = A.set_index('Name')
B = B.set_index('Name')

idx = B.index.difference(A.index)
df = pd.concat([A, B.loc[idx]]).reset_index()

两者的性能应大致相同.

Both should be approximately the same in terms of performance.

这篇关于比较两个Excel文件与Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆