通过具有 NaN 值的数据帧更新数据帧 [英] Update a dataframe by dataframes with NaN values
问题描述
我尝试更新 DataFrame
df1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [5,6,7,8]})
通过另一个数据帧
df2 = pd.DataFrame(data = {'B' : [9, np.nan, 11, np.nan]}).
现在,我的目标是通过 df2
更新 df1
并使用
NaN
值)df1.update(df2)
与常见用法相比,在 df1
中最终获得 NaN
值对我来说很重要.但据我所知更新返回
有没有办法获取
<预><代码>>>>df1甲乙0 1 91 2 南2 3 113 4 南无需手动构建 df1
?
我迟到了,但我最近遇到了同样的问题,ie 试图在不忽略 NaN 值的情况下更新数据帧就像 Pandas 内置的 update
方法一样.对于共享相同列名的两个数据框,一种解决方法是连接两个数据框,然后删除重复项,只保留最后一个条目:
将pandas导入为pd将 numpy 导入为 npdf1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B': [5,6,7,8]})df2 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [9, np.nan, 11, np.nan]})帧 = [df1, df2]df_concatenated = pd.concat(frames)df1=df_concatenated.loc[~df_concatenated.index.duplicated(keep='last')]
根据索引,可能需要对输出数据帧的索引进行排序:
df1=df1.sort_index()
<小时>
要解决 df2
没有 A 列的非常具体的示例,您可以运行:
将pandas导入为pd将 numpy 导入为 npdf1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B': [5,6,7,8]})df2 = pd.DataFrame(data = {'B' : [9, np.nan, 11, np.nan]})帧 = [df1, df2]df_concatenated = pd.concat(frames)df1['B']=df_concatenated.loc[~df_concatenated.index.duplicated(keep='last')]['B']
I try to update a DataFrame
df1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [5,6,7,8]})
by another DataFrame
df2 = pd.DataFrame(data = {'B' : [9, np.nan, 11, np.nan]}).
Now, my aim is to update df1
by df2
and overwrite all values (NaN
values too) using
df1.update(df2)
In contrast with the common usage it's important to me to get the NaN
values finally in df1
.
But as far as I see the update returns
>>> df1
A B
0 1 9
1 2 6
2 3 11
3 4 8
Is there a way to get
>>> df1
A B
0 1 9
1 2 NaN
2 3 11
3 4 NaN
without building df1
manually?
I am late to the party but I was recently confronted to the same issue, i.e. trying to update a dataframe without ignoring NaN values like the Pandas built-in update
method does.
For two dataframes sharing the same column names, a workaround would be to concatenate both dataframes and then remove duplicates, only keeping the last entry:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [5,6,7,8]})
df2 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [9, np.nan, 11, np.nan]})
frames = [df1, df2]
df_concatenated = pd.concat(frames)
df1=df_concatenated.loc[~df_concatenated.index.duplicated(keep='last')]
Depending on indexing, it might be necessary to sort the indices of the output dataframe:
df1=df1.sort_index()
To address you very specific example for which df2
does not have a column A, you could run:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(data = {'A' : [1,2,3,4], 'B' : [5,6,7,8]})
df2 = pd.DataFrame(data = {'B' : [9, np.nan, 11, np.nan]})
frames = [df1, df2]
df_concatenated = pd.concat(frames)
df1['B']=df_concatenated.loc[~df_concatenated.index.duplicated(keep='last')]['B']
这篇关于通过具有 NaN 值的数据帧更新数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!