反联接 pandas [英] Anti-Join Pandas

查看:115
本文介绍了反联接 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表,我想追加它们,以便仅保留表A中的所有数据,并且仅在其键是唯一的情况下才添加表B中的数据(但是键值在表A和B中是唯一的).在某些情况下,表A和表B中都会出现一个键).

I have two tables and I would like to append them so that only all the data in table A is retained and data from table B is only added if its key is unique (Key values are unique in table A and B however in some cases a Key will occur in both table A and B).

我认为,这样做的方法将涉及某种过滤联接(反联接),以获取表B中未在表A中出现的值,然后追加两个表.

I think the way to do this will involve some sort of filtering join (anti-join) to get values in table B that do not occur in table A then append the two tables.

我熟悉R,这是我将在R中执行此操作的代码.

I am familiar with R and this is the code I would use to do this in R.

library("dplyr")

## Filtering join to remove values already in "TableA" from "TableB"
FilteredTableB <- anti_join(TableB,TableA, by = "Key")

## Append "FilteredTableB" to "TableA"
CombinedTable <- bind_rows(TableA,FilteredTableB)

我如何在python中实现呢?

How would I achieve this in python?

推荐答案

请考虑以下数据框

TableA = pd.DataFrame(np.random.rand(4, 3),
                      pd.Index(list('abcd'), name='Key'),
                      ['A', 'B', 'C']).reset_index()
TableB = pd.DataFrame(np.random.rand(4, 3),
                      pd.Index(list('aecf'), name='Key'),
                      ['A', 'B', 'C']).reset_index()


TableA

TableB

这是做自己想要的事情的一种方式

This is one way to do what you want

# Identify what values are in TableB and not in TableA
key_diff = set(TableB.Key).difference(TableA.Key)
where_diff = TableB.Key.isin(key_diff)

# Slice TableB accordingly and append to TableA
TableA.append(TableB[where_diff], ignore_index=True)

rows = []
for i, row in TableB.iterrows():
    if row.Key not in TableA.Key.values:
        rows.append(row)

pd.concat([TableA.T] + rows, axis=1).T


计时

4行,其中2行重叠


Timing

4 rows with 2 overlap

方法1更快

10,000行5,000重叠

循环不好

这篇关于反联接 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆