反联接 pandas [英] Anti-Join Pandas
问题描述
我有两个表,我想追加它们,以便仅保留表A中的所有数据,并且仅在其键是唯一的情况下才添加表B中的数据(但是键值在表A和B中是唯一的).在某些情况下,表A和表B中都会出现一个键).
I have two tables and I would like to append them so that only all the data in table A is retained and data from table B is only added if its key is unique (Key values are unique in table A and B however in some cases a Key will occur in both table A and B).
我认为,这样做的方法将涉及某种过滤联接(反联接),以获取表B中未在表A中出现的值,然后追加两个表.
I think the way to do this will involve some sort of filtering join (anti-join) to get values in table B that do not occur in table A then append the two tables.
我熟悉R,这是我将在R中执行此操作的代码.
I am familiar with R and this is the code I would use to do this in R.
library("dplyr")
## Filtering join to remove values already in "TableA" from "TableB"
FilteredTableB <- anti_join(TableB,TableA, by = "Key")
## Append "FilteredTableB" to "TableA"
CombinedTable <- bind_rows(TableA,FilteredTableB)
我如何在python中实现呢?
How would I achieve this in python?
推荐答案
请考虑以下数据框
TableA = pd.DataFrame(np.random.rand(4, 3),
pd.Index(list('abcd'), name='Key'),
['A', 'B', 'C']).reset_index()
TableB = pd.DataFrame(np.random.rand(4, 3),
pd.Index(list('aecf'), name='Key'),
['A', 'B', 'C']).reset_index()
TableA
TableB
这是做自己想要的事情的一种方式
This is one way to do what you want
# Identify what values are in TableB and not in TableA
key_diff = set(TableB.Key).difference(TableA.Key)
where_diff = TableB.Key.isin(key_diff)
# Slice TableB accordingly and append to TableA
TableA.append(TableB[where_diff], ignore_index=True)
rows = []
for i, row in TableB.iterrows():
if row.Key not in TableA.Key.values:
rows.append(row)
pd.concat([TableA.T] + rows, axis=1).T
计时
4行,其中2行重叠
Timing
4 rows with 2 overlap
方法1更快
10,000行5,000重叠
循环不好
这篇关于反联接 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!