获取一个数据帧中存在的行,而不是另一个 [英] Get rows that are present in one dataframe, but not the other

查看:148
本文介绍了获取一个数据帧中存在的行,而不是另一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 df1 中提取那些不在 df2 中的行(身份是索引)。对于下面的示例,我希望返回 df1 中的第一行。不幸的是,结果是空的。

  import pandas as pd 

df1 = pd.DataFrame({
'level-0':['a','a','a','a','a','a'],
'level-1' ,'s2','s2','s2','s2','s2'],
'level-2':['1','1','1' 1','1'],
'level-3':['19','20','21','22','23','24'],
' -4':['HRB','HRB','HRB','HRB','HRB','HRB'],
'name':['a','b','c' ,'d','e','f']
})

df1 = df1.set_index(['level-0','level-1','level-2 ','level-3','level-4'],drop = False)

df2 = pd.DataFrame({
'level-0':['a' a','a','a','a','b'],
'level-1':['s2','s2','s2','s2' ,'s2'],
'level-2':['1','1','1','1' ,'1','1'],
'level-3':['19','20','21','22','23','24'],
'level-4':['HRB','HRB','HRB'''HRB'''HRB''''
})
df2 = df2.set_index(['level -0','level-1','level-2','level-3','level-4'],drop = False)

#df1中的所有索引,在df2
df_unknown = df1 [〜df1.index.isin(df2.index)]
打印df_unknown

选择有什么问题?



更新



int ,而要比较的数据框的列已经转换为 str 。这导致了不同的索引。

解决方案

set_index 默认情况下,所以在调用之后, df1 df2 仍然具有数字索引。执行

  df2.set_index(...,inplace = True)
/ pre>

  df2 = df2.set_index 。)

你会看到,目前大多数的大多数方法都是以这种方式工作的。


I'd like to extract those rows from df1 which are not existent in df2 (identity is the index). For the below example, I would expect the first row in df1 to be returned. Unfortunately, the result is empty.

import pandas as pd

df1 = pd.DataFrame({
    'level-0': ['a', 'a', 'a', 'a', 'a', 'a'],
    'level-1': ['s2', 's2', 's2', 's2', 's2', 's2'],
    'level-2': ['1', '1', '1', '1', '1', '1'],
    'level-3': ['19', '20', '21', '22', '23', '24'],
    'level-4': ['HRB', 'HRB', 'HRB', 'HRB', 'HRB', 'HRB'],
    'name': ['a', 'b', 'c', 'd', 'e', 'f']
})

df1 = df1.set_index(['level-0', 'level-1', 'level-2', 'level-3', 'level-4'], drop=False)

df2 = pd.DataFrame({
    'level-0': ['a', 'a', 'a', 'a', 'a', 'b'],
    'level-1': ['s2', 's2', 's2', 's2', 's2', 's2'],
    'level-2': ['1', '1', '1', '1', '1', '1'],
    'level-3': ['19', '20', '21', '22', '23', '24'],
    'level-4': ['HRB', 'HRB', 'HRB', 'HRB', 'HRB', 'HRB']
})
df2 = df2.set_index(['level-0', 'level-1', 'level-2', 'level-3', 'level-4'], drop=False)

# all indices that are in df1 but not in df2
df_unknown = df1[~df1.index.isin(df2.index)]
print df_unknown

What's wrong with the selection?

Update

I figured out what went wrong. The dataframes were read from an Excel file and some Series were interpreted as int, while the dataframe to compare with had its columns already converted to str. This resulted in different indices.

解决方案

set_index is not in place by default, so df1 and df2 still have their numeric index after the call. Do either

df2.set_index(..., inplace=True)

or

df2 = df2.set_index(...)

You will see that by far the most methods in pandas work that way.

这篇关于获取一个数据帧中存在的行,而不是另一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆