pandas -按行元素按另一个数据框过滤数据框 [英] pandas - filter dataframe by another dataframe by row elements

查看:79
本文介绍了 pandas -按行元素按另一个数据框过滤数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df1,看起来像:

I have a dataframe df1 which looks like:

   c  k  l
0  A  1  a
1  A  2  b
2  B  2  a
3  C  2  a
4  C  2  d

和另一个名为df2的:

and another called df2 like:

   c  l
0  A  b
1  C  a

我想过滤df1,仅保留不在df2中的值.要过滤的值应为(A,b)(C,a)元组.到目前为止,我尝试应用isin方法:

I would like to filter df1 keeping only the values that ARE NOT in df2. Values to filter are expected to be as (A,b) and (C,a) tuples. So far I tried to apply the isin method:

d = df[~(df['l'].isin(dfc['l']) & df['c'].isin(dfc['c']))]

在我看来,这太复杂了,它返回:

Apart that seems to me too complicated, it returns:

   c  k  l
2  B  2  a
4  C  2  d

但是我期望:

   c  k  l
0  A  1  a
2  B  2  a
4  C  2  d

推荐答案

对于由所需列构成的多索引,您可以使用isin有效地做到这一点:

You can do this efficiently using isin on a multiindex constructed from the desired columns:

df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
                    'k': [1, 2, 2, 2, 2],
                    'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
                    'l': ['b', 'a']})
keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df2.set_index(keys).index
df1[~i1.isin(i2)]

我认为这在@IanS的类似解决方案上有所改进,因为它不假定任何列类型(即,它既可以使用数字,也可以使用字符串).

I think this improves on @IanS's similar solution because it doesn't assume any column type (i.e. it will work with numbers as well as strings).

(上面的答案是编辑.下面是我的最初答案)

(Above answer is an edit. Following was my initial answer)

有趣!这是我以前从未遇到过的事情……我可能会通过合并两个数组,然后删除定义了df2的行来解决.这是一个使用临时数组的示例:

Interesting! This is something I haven't come across before... I would probably solve it by merging the two arrays, then dropping rows where df2 is defined. Here is an example, which makes use of a temporary array:

df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
                    'k': [1, 2, 2, 2, 2],
                    'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
                    'l': ['b', 'a']})

# create a column marking df2 values
df2['marker'] = 1

# join the two, keeping all of df1's indices
joined = pd.merge(df1, df2, on=['c', 'l'], how='left')
joined

# extract desired columns where marker is NaN
joined[pd.isnull(joined['marker'])][df1.columns]

也许可以不使用临时数组来执行此操作,但是我想不到一个.只要您的数据不是很大,上述方法应该是一个快速而充分的答案.

There may be a way to do this without using the temporary array, but I can't think of one. As long as your data isn't huge the above method should be a fast and sufficient answer.

这篇关于 pandas -按行元素按另一个数据框过滤数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆