pandas 将数据框与NaN(或“未知”)合并以获取缺失值 [英] pandas merge dataframe with NaN (or "unknown") for missing values

查看:67
本文介绍了 pandas 将数据框与NaN(或“未知”)合并以获取缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据框,其中一个包含另一行中某些(但不是全部)行的补充信息。

I have 2 dataframes, one of which has supplemental information for some (but not all) of the rows in the other.

names = df({'names':['bob','frank','james','tim','ricardo','mike','mark','joan','joe'],
            'position':['dev','dev','dev','sys','sys','sys','sup','sup','sup']})
info = df({'names':['joe','mark','tim','frank'],
           'classification':['thief','thief','good','thief']})

我想从 info中获取分类列上面的数据框,并将其添加到上面的 names 数据框。但是,当我执行 combined = pd.merge(names,info)时,结果数据帧只有4行长。

I would like to take the classification column from the info dataframe above and add it to the names dataframe above. However, when I do combined = pd.merge(names, info) the resulting dataframe is only 4 rows long. All of the rows that do not have supplemental info are dropped.

理想情况下,我会将那些缺少的列中的值设置为unknown。结果是一个数据框,其中有些人是有教养的,有些人是有教养的,而其他人则是未知的。

Ideally, I would have the values in those missing columns set to unknown. Resulting in a dataframe where some people are theives, some are good, and the rest are unknown.

编辑:
我收到的第一个答案建议使用合并外部似乎做一些奇怪的事情。这是一个代码示例:

One of the first answers I received suggested using merge outter which seems to do some weird things. Here is a code sample:

names = df({'names':['bob','frank','bob','bob','bob''james','tim','ricardo','mike','mark','joan','joe'],
            'position':['dev','dev','dev','dev','dev','dev''sys','sys','sys','sup','sup','sup']})
info = df({'names':['joe','mark','tim','frank','joe','bill'],
           'classification':['thief','thief','good','thief','good','thief']})
what = pd.merge(names, info, how="outer")
what.fillna("unknown")

奇怪的是,在输出中我将得到一行,结果名称为 bobjames,另一个位置是 devsys。最后,即使Bill未出现在名称数据框中,它仍会显示在结果数据框中。因此,我真的需要一种方法在其他数据框中查找值,以及是否在这些列上找到内容。

The strange thing is that in the output I'll get a row where the resulting name is "bobjames" and another where position is "devsys". Finally, even though bill does not appear in the names dataframe it shows up in the resulting dataframe. So I really need a way to say lookup a value in this other dataframe and if you find something tack on those columns.

推荐答案

如果您仍在寻找答案:

您描述的奇怪是由于代码中的一些小错误。例如,第一个(外观为 bobjames和 devsys)是由于您在源数据帧中的这两个值之间没有逗号。第二个原因是,pandas不在乎数据框的名称,而是在合并时关心列的名称(您有一个名为 names的数据框,但您的列也称为 names)。否则,合并似乎完全可以满足您的要求:

The "strange" things that you described are due to some minor errors in your code. For example, the first (appearance of "bobjames" and "devsys") is due to the fact that you don't have a comma between those two values in your source dataframes. And the second is because pandas doesn't care about the name of your dataframe but cares about the name of your columns when merging (you have a dataframe called "names" but also your columns are called "names"). Otherwise, it seems that the merge is doing exactly what you are looking for:

import pandas as pd
names = pd.DataFrame({'names':['bob','frank','bob','bob','bob', 'james','tim','ricardo','mike','mark','joan','joe'], 
                      'position':['dev','dev','dev','dev','dev','dev', 'sys','sys','sys','sup','sup','sup']})

info = pd.DataFrame({'names':['joe','mark','tim','frank','joe','bill'],
                     'classification':['thief','thief','good','thief','good','thief']})
what = pd.merge(names, info, how="outer")
what.fillna('unknown', inplace=True)

这将导致:

      names position classification
0       bob      dev        unknown
1       bob      dev        unknown
2       bob      dev        unknown
3       bob      dev        unknown
4     frank      dev          thief
5     james      dev        unknown
6       tim      sys           good
7   ricardo      sys        unknown
8      mike      sys        unknown
9      mark      sup          thief
10     joan      sup        unknown
11      joe      sup          thief
12      joe      sup           good
13     bill  unknown          thief

这篇关于 pandas 将数据框与NaN(或“未知”)合并以获取缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆