Python/Pandas Dataframe 用中值替换 0 [英] Python/Pandas Dataframe replace 0 with median value

查看:126
本文介绍了Python/Pandas Dataframe 用中值替换 0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多列的 python pandas 数据框,其中一列具有 0 值.我想用此列的 medianmean 替换 0 值.

data 是我的数据框
artist_hotness 是栏目

mean_artist_hotness = data['artist_hotness'].dropna().mean()如果 len(data.artist_hotness[data.artist_hotness.isnull()]) >0:data.artist_hotness.loc[(data.artist_hotness.isnull()), 'artist_hotness'] = mean_artist_hotness

我试过这个,但它不起作用.

解决方案

我认为你可以使用 mask 并将参数 skipna=True 添加到 mean 而不是 dropna.还需要将条件更改为 data.artist_hotness == 0 如果需要替换 0 值或 data.artist_hotness.isnull() 如果需要替换 >NaN 值:

将pandas导入为pd将 numpy 导入为 npdata = pd.DataFrame({'artist_hotness': [0,1,5,np.nan]})打印(数据)艺术家热度0 0.01 1.02 5.03 南mean_artist_hotness = data['artist_hotness'].mean(skipna=True)打印(mean_artist_hotness)2.0数据['artist_hotness']=data.artist_hotness.mask(data.artist_hotness == 0,mean_artist_hotness)打印(数据)艺术家热度0 2.01 1.02 5.03 南

<小时>

或者使用 loc,但省略列名:

data.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness打印(数据)艺术家热度0 2.01 1.02 5.03 南data.artist_hotness.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness打印(数据)

<块引用>

索引错误:(0 真1 错误2 错误3 错误名称:artist_hotness,dtype:bool,'artist_hotness')

另一个解决方案是DataFrame.replace 指定列:

data=data.replace({'artist_hotness': {0: mean_artist_hotness}})打印(数据)aa Artist_hotness0 0.0 2.01 1.0 1.02 5.0 5.03 南南

或者如果需要替换所有列中的所有 0 值:

将pandas导入为pd将 numpy 导入为 npdata = pd.DataFrame({'artist_hotness': [0,1,5,np.nan], 'aa': [0,1,5,np.nan]})打印(数据)aa Artist_hotness0 0.0 0.01 1.0 1.02 5.0 5.03 南南mean_artist_hotness = data['artist_hotness'].mean(skipna=True)打印(mean_artist_hotness)2.0数据=数据.replace(0,mean_artist_hotness)打印(数据)aa Artist_hotness0 2.0 2.01 1.0 1.02 5.0 5.03 南南

如果需要替换所有列中的 NaN 使用 DataFrame.fillna:

data=data.fillna(mean_artist_hotness)打印(数据)aa Artist_hotness0 0.0 0.01 1.0 1.02 5.0 5.03 2.0 2.0

但如果仅在某些列中使用 系列.fillna:

data['artist_hotness'] = data.artist_hotness.fillna(mean_artist_hotness)打印(数据)aa Artist_hotness0 0.0 0.01 1.0 1.02 5.0 5.03 纳米 2.0

I have a python pandas dataframe with several columns and one column has 0 values. I want to replace the 0 values with the median or mean of this column.

data is my dataframe
artist_hotness is the column

mean_artist_hotness = data['artist_hotness'].dropna().mean()

if len(data.artist_hotness[ data.artist_hotness.isnull() ]) > 0:
data.artist_hotness.loc[ (data.artist_hotness.isnull()), 'artist_hotness'] = mean_artist_hotness

I tried this, but it is not working.

解决方案

I think you can use mask and add parameter skipna=True to mean instead dropna. Also need change condition to data.artist_hotness == 0 if need replace 0 values or data.artist_hotness.isnull() if need replace NaN values:

import pandas as pd
import numpy as np

data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan]})
print (data)
   artist_hotness
0             0.0
1             1.0
2             5.0
3             NaN

mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0

data['artist_hotness']=data.artist_hotness.mask(data.artist_hotness == 0,mean_artist_hotness)
print (data)
   artist_hotness
0             2.0
1             1.0
2             5.0
3             NaN


Alternatively use loc, but omit column name:

data.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)
   artist_hotness
0             2.0
1             1.0
2             5.0
3             NaN

data.artist_hotness.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)

IndexingError: (0 True 1 False 2 False 3 False Name: artist_hotness, dtype: bool, 'artist_hotness')

Another solution is DataFrame.replace with specifying columns:

data=data.replace({'artist_hotness': {0: mean_artist_hotness}}) 
print (data)
    aa  artist_hotness
0  0.0             2.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN 

Or if need replace all 0 values in all columns:

import pandas as pd
import numpy as np

data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan], 'aa': [0,1,5,np.nan]})
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN

mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0

data=data.replace(0,mean_artist_hotness) 
print (data)
    aa  artist_hotness
0  2.0             2.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN

If need replace NaN in all columns use DataFrame.fillna:

data=data.fillna(mean_artist_hotness) 
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  2.0             2.0

But if only in some columns use Series.fillna:

data['artist_hotness'] = data.artist_hotness.fillna(mean_artist_hotness) 
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  NaN             2.0

这篇关于Python/Pandas Dataframe 用中值替换 0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆