计算两个数据帧之间的 pandas 集差异 [英] Computing Set Difference in Pandas between two dataframes

查看:60
本文介绍了计算两个数据帧之间的 pandas 集差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想知道如何使用两个不同的数据帧来计算Python熊猫中的集合差异.

Wondering how to compute set difference in Python's Pandas using two different dataframes.

一个数据帧(df1)的格式为:

One dataframe (df1) is of the format:

State  City          Population
NY     Albany        856654
WV     Wheeling      23434
SC     Charleston    35323
OH     Columbus      343534
WV     Charleston    34523

第二个数据帧(df2)是

And the second data frame (df2) is

State  City
WV     Wheeling
OH     Columns

我需要一个返回以下数据帧的操作

And I need an operation that returns the following data frame

State   City        Population
NY      Albany      856654
SC      Charleston  35323
WV      Charleston  34523

从本质上讲,我无法弄清楚如何基于2列从df1中减去" df2(两者都是必需的,因为我将在不同州使用重复的城市名称).

Essentially, I can't figure out how to "subtract" df2 from df1 based on the 2 columns (need both since I'll have repeats of city names across different states).

推荐答案

使用indicator进行左联接,该联接提供有关每一行的原点的信息,然后您可以根据indicator进行过滤:

Do a left join with indicator which gives information on the origin of each row, then you can filter based on the indicator:

df1.merge(df2, indicator=True, how="left")[lambda x: x._merge=='left_only'].drop('_merge',1)

#State       City   Population
#0  NY      Albany      856654
#2  SC  Charleston       35323
#4  WV  Charleston       34523

这篇关于计算两个数据帧之间的 pandas 集差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆