根据Python pandas 中的索引补余选择元素 [英] picking out elements based on complement of indices in Python pandas

查看:61
本文介绍了根据Python pandas 中的索引补余选择元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我从其中选择了两个子集dfs,df_adf_b.例如在iris数据集中:

I have a dataframe out of which I pick two subset dfs, df_a and df_b. For example in iris dataset:

df_a = iris[iris.Name == "Iris-setosa"]
df_b = iris[iris.Name == "Iris-virginica"]

获取iris中既不在df_a中也不在df_b中的所有iris元素的最佳方法是什么?我不想参考定义df_adf_b的原始条件.我只是假设df_adf_biris的子集,所以我想基于df_adf_b的索引从iris中提取元素.基本上假设:

What's the best way to get all elements of iris that are neither in df_a nor in df_b? I prefer not to refer to the original conditions that defined df_a and df_b. I just assume that df_a and df_b are subsets of iris, so I'd like to pull out elements from iris based on the indices of df_a and df_b. Basically, assume that:

df_a = get_a_subset(iris)
df_b = get_b_subset(iris)
# retrieve the subset of iris that 
# has all elements not in df_a or in df_b
# ...

这是一个效率低下且不太雅致的解决方案,我相信熊猫有更好的方法:

here is a solution that seems inefficient and inelegant and I'm sure pandas has a better way:

# get subset of iris that is not in a nor in b
df_rest = iris[map(lambda x: (x not in df_a.index) & (x not in df_b.index), iris.index)]

第二个:

df_rest = iris.ix[iris.index - df_a.index - df_b.index]

如何在熊猫中最有效/优雅地做到这一点?谢谢.

how can this be done most efficiently/elegantly in pandas? thanks.

推荐答案

这似乎比第二种解决方案要快.使用.ix进行索引时会产生更多开销:

This seems a bit faster than your second solution. There's a bit more overhead when indexing with .ix:

df[~df.index.isin(df_a.index+df_b.index)]

这篇关于根据Python pandas 中的索引补余选择元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆