在Pandas中，如何使用给定相似索引的另一个数据帧中的值来修补缺少值的数据帧? [英] In Pandas, how can I patch a dataframe with missing values with values from another dataframe given a similar index?

查看：67 发布时间：2020/5/24 1:07:59 python pandas

本文介绍了在Pandas中，如何使用给定相似索引的另一个数据帧中的值来修补缺少值的数据帧?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框，想填写缺失的值.

I have the following dataframe and would like to fill in missing values.

mukey   hzdept_r    hzdepb_r    sandtotal_r silttotal_r
425897      0         61        
425897      61        152          5.3         44.7
425911      0         30           30.1        54.9
425911      30        74           17.7        49.8
425911      74        84

我希望每个缺失值都是对应于该mukey的值的平均值.在这种情况下，例如第一行缺失值将是对应于mukey == 425897的sandtotal_r和silttotal_r的平均值.熊猫fillna似乎并不能解决问题.有帮助吗?

I want each missing value to be the average of values corresponding to that mukey. In this case, e.g. the first row missing values will be the average of sandtotal_r and silttotal_r corresponding to mukey==425897. pandas fillna doesn't seem to do the trick. Any help?

虽然该代码适用于该示例中的示例数据帧，但是在我上传到此处的较大数据集上却失败了:

While the code works for the sample dataframe in that example, it is failing on the larger dataset I have uploaded here: https://www.dropbox.com/s/w3m0jppnq74op4c/www004.csv?dl=0

import pandas as pd
df = pd.read_csv('www004.csv')
# CSV file is here: https://www.dropbox.com/s/w3m0jppnq74op4c/www004.csv?dl=0
df1 = df.set_index('mukey')
df1.fillna(df.groupby('mukey').mean(),inplace=True)
df1.reset_index()

我收到错误:InvalidIndexError.为什么它不起作用?

I get the error: InvalidIndexError. Why is it not working?

推荐答案

使用

Use combine_first. It allows you to patch up the missing data on the left dataframe with the matching data on the right dataframe based on same index.

在这种情况下，df1位于左侧，而平均值df2位于右侧.

In this case, df1 is on the left and df2, the means, as the one on the right.

In [48]: df = pd.read_csv('www004.csv') ...: df1 = df.set_index('mukey') ...: df2 = df.groupby('mukey').mean() In [49]: df1.loc[426178,:] Out[49]: hzdept_r hzdepb_r sandtotal_r silttotal_r claytotal_r om_r mukey 426178 0 36 NaN NaN NaN 72.50 426178 36 66 NaN NaN NaN 72.50 426178 66 152 42.1 37.9 20 0.25 In [50]: df2.loc[426178,:] Out[50]: hzdept_r 34.000000 hzdepb_r 84.666667 sandtotal_r 42.100000 silttotal_r 37.900000 claytotal_r 20.000000 om_r 48.416667 Name: 426178, dtype: float64 In [51]: df3 = df1.combine_first(df2) ...: df3.loc[426178,:] Out[51]: hzdept_r hzdepb_r sandtotal_r silttotal_r claytotal_r om_r mukey 426178 0 36 42.1 37.9 20 72.50 426178 36 66 42.1 37.9 20 72.50 426178 66 152 42.1 37.9 20 0.25

请注意，以下行在生成的df3

Note that the following rows still won't have values in the resulting df3

426162 426163 426174 426174 426255

因为它们是单行开头，所以.mean()对他们没有任何意义(嗯，看看我在那儿做了什么?).

because they were single rows to begin with, hence, .mean() doesn't mean anything to them (eh, see what I did there?).

这篇关于在Pandas中，如何使用给定相似索引的另一个数据帧中的值来修补缺少值的数据帧?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Pandas中，如何使用给定相似索引的另一个数据帧中的值来修补缺少值的数据帧? [英] In Pandas, how can I patch a dataframe with missing values with values from another dataframe given a similar index?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Pandas中，如何使用给定相似索引的另一个数据帧中的值来修补缺少值的数据帧? [英] In Pandas, how can I patch a dataframe with missing values with values from another dataframe given a similar index?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭