根据Python pandas 中记录的补充来挑选元素 [英] picking out elements based on complement of records in Python pandas
问题描述
我有一个python pandas DataFrame问题.有两个包含记录的DataFrame,分别是 df1 和 df2 .它们包含以下值:
I have a python pandas DataFrame question. There are two DataFrames containing records, df1 and df2. They contain the following values:
df1:
pkid start end
0 0 2005 2005
1 1 2006 2006
2 2 2007 2007
3 3 2008 2008
4 4 2009 2009
df2:
pkid start end
0 3 2008 2008
1 NaN 2009 2009
2 NaN 2010 2010
我希望将w/index = 2的记录与 df2 隔离.换句话说,我正在寻找 df2 的所有记录,而在 df1 中没有匹配的记录,其中仅考虑开始和结束列的值.谢谢!
I am looking to isolate the record w/index=2 from df2. In other words, I am looking to find all records of df2 where there are not matching records in df1 where only the start and end column values are considered. Thanks!
推荐答案
This operation called antijoin (▷)
in relational algebra and SQL. I've tried to find native pandas operation for this, but found nothing.
但是您可以通过功能的方式来实现它,不了解性能:)
But you can do it functional way, don't know about performance :)
>>> t1 = df1[["start", "end"]]
>>> t2 = df2[["start", "end"]]
>>> f = t2.apply(lambda x2: t1.apply(lambda x1: x1.isin(x2).all(), axis=1).any(), axis=1)
>>> df2[~f]
end pkid start
2 2010 NaN 2010
更新:
在SQL中,它可以通过不同的方式完成,例如not exists
:
update:
In SQL, it can be done by different ways, like not exists
:
select *
from df2
where not exists (select * from df1 where df1.start = df2.start and df1.end = df2.end)
或left outer join
与where
子句:
select *
from df1
left outer join df1 on df1.start = df2.start and df1.end = df1.end
where df1.<key> is null
最后一个可以使用 merge
:
Last one could be implemented in pandas with merge
:
>>> m = pd.merge(df2, df1, how='left', on=['end','start'], suffixes=['','_r'])
>>> df2[m['pkid_r'].isnull()]
end pkid start
2 2010 NaN 2010
这篇关于根据Python pandas 中记录的补充来挑选元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!