比较两个Excel文件与Pandas [英] Comparing two excel file with pandas
问题描述
我有两个Excel文件A和B.A是主副本,其中有员工姓名和组织名称( Name
和 Org
)的更新记录.文件B包含 Name
和 Org
列,且记录较旧,而其他许多我们不感兴趣的列.
I have two excel file, A and B. A is Master copy where updated record of employee Name and Organization Name (Name
and Org
) is available. File B contains Name
and Org
columns with bit older record and many other columns which we are not interested in.
Name Org
0 abc ddc systems
1 sdc ddc systems
2 csc ddd systems
3 rdc kbf org
4 rfc kbf org
我要对此进行两项操作:
I want to do two operation on this:
1)我想将Excel B(列 Name
和 Org
)与Excel A(列 Name
和 Org
),并使用所有缺少的 Name
名称和相应的 Org
条目更新文件B.
1) I want to compare Excel B (column Name
and Org
) with Excel A (column Name
and Org
) and update file B with all the missing entries of Name
and corresponding Org
.
2)对于文件B中的所有现有条目(列 Name
和 Org
),我想比较文件和文件A并更新 Org
列(如果有任何员工组织已更改).
2) For all existing entries in File B (column Name
and Org
), I would like to compare file and with file A and update Org
column if any employee organization has changed.
对于解决方案1),找到以下方法尝试过的新条目(虽然不确定这种方法是否正确),但输出是元组,我不确定如何更新回 DataFrame
.
For Solution 1) to find the new entries tried below approach (Not sure if this approach is correct though), output is tuple which I was not sure how to update back to DataFrame
.
diff = set(zip(new_df.Name, new_df.Org)) - set(zip(old_df.Name, old_df.Org))
任何帮助将不胜感激.谢谢.
Any help will be appreciated. Thanks.
推荐答案
如果名称是唯一的,只需串联A和B,然后删除重复的名称.假设 A
和 B
是您的数据框,
If names are unique, just concatenate A and B, and drop duplicates. Assuming A
and B
are your DataFrames,
df = pd.concat([A, B]).drop_duplicates(subset=['Name'], keep='first')
或者,
A = A.set_index('Name')
B = B.set_index('Name')
idx = B.index.difference(A.index)
df = pd.concat([A, B.loc[idx]]).reset_index()
两者的性能应大致相同.
Both should be approximately the same in terms of performance.
这篇关于比较两个Excel文件与Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!