通过具有 NaN 值的数据帧更新数据帧 [英] Update a dataframe by dataframes with NaN values

查看:29
本文介绍了通过具有 NaN 值的数据帧更新数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试更新 DataFrame

df1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [5,6,7,8]})

通过另一个数据帧

df2 = pd.DataFrame(data = {'B' : [9, np.nan, 11, np.nan]}).

现在,我的目标是通过 df2 更新 df1 并使用

覆盖所有值(NaN 值)

df1.update(df2)

与常见用法相比,在 df1 中最终获得 NaN 值对我来说很重要.但据我所知更新返回

<预><代码>>>>df1甲乙0 1 91 2 62 3 113 4 8

有没有办法获取

<预><代码>>>>df1甲乙0 1 91 2 南2 3 113 4 南

无需手动构建 df1?

解决方案

我迟到了,但我最近遇到了同样的问题,ie 试图在不忽略 NaN 值的情况下更新数据帧就像 Pandas 内置的 update 方法一样.对于共享相同列名的两个数据框,一种解决方法是连接两个数据框,然后删除重复项,只保留最后一个条目:

将pandas导入为pd将 numpy 导入为 npdf1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B': [5,6,7,8]})df2 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [9, np.nan, 11, np.nan]})帧 = [df1, df2]df_concatenated = pd.concat(frames)df1=df_concatenated.loc[~df_concatenated.index.duplicated(keep='last')]

根据索引,可能需要对输出数据帧的索引进行排序:

df1=df1.sort_index()

<小时>

要解决 df2 没有 A 列的非常具体的示例,您可以运行:

将pandas导入为pd将 numpy 导入为 npdf1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B': [5,6,7,8]})df2 = pd.DataFrame(data = {'B' : [9, np.nan, 11, np.nan]})帧 = [df1, df2]df_concatenated = pd.concat(frames)df1['B']=df_concatenated.loc[~df_concatenated.index.duplicated(keep='last')]['B']

I try to update a DataFrame

df1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [5,6,7,8]})

by another DataFrame

df2 = pd.DataFrame(data = {'B' : [9, np.nan, 11, np.nan]}).

Now, my aim is to update df1 by df2 and overwrite all values (NaN values too) using

df1.update(df2)

In contrast with the common usage it's important to me to get the NaN values finally in df1. But as far as I see the update returns

>>> df1
      A   B
0     1   9
1     2   6
2     3   11
3     4   8

Is there a way to get

>>> df1
    A    B
0   1    9
1   2    NaN
2   3    11
3   4    NaN

without building df1 manually?

解决方案

I am late to the party but I was recently confronted to the same issue, i.e. trying to update a dataframe without ignoring NaN values like the Pandas built-in update method does. For two dataframes sharing the same column names, a workaround would be to concatenate both dataframes and then remove duplicates, only keeping the last entry:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [5,6,7,8]})
df2 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [9, np.nan, 11, np.nan]})

frames = [df1, df2]
df_concatenated = pd.concat(frames)
df1=df_concatenated.loc[~df_concatenated.index.duplicated(keep='last')]

Depending on indexing, it might be necessary to sort the indices of the output dataframe:

df1=df1.sort_index()


To address you very specific example for which df2 does not have a column A, you could run:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [5,6,7,8]})
df2 = pd.DataFrame(data = {'B' : [9, np.nan, 11, np.nan]})

frames = [df1, df2]
df_concatenated = pd.concat(frames)

df1['B']=df_concatenated.loc[~df_concatenated.index.duplicated(keep='last')]['B']

这篇关于通过具有 NaN 值的数据帧更新数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆