pandas 不会到位fillna() [英] Pandas won't fillna() inplace

查看:86
本文介绍了 pandas 不会到位fillna()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据帧的4个特定列(即字符串/对象类型)中用"填充NA.我可以在fillna()时将这些列分配给新变量,但是当我填充fillna()时,基础数据不会更改.

I'm trying to fill NAs with "" on 4 specific columns in a data frame that are string/object types. I can assign these columns to a new variable as I fillna(), but when I fillna() inplace the underlying data doesn't change.

a_n6 = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")
a_n6

给我:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 4 columns):
PROV LAST     1542  non-null values
PROV FIRST    1542  non-null values
PROV MID      1542  non-null values
SPEC NM       1542  non-null values
dtypes: object(4)

但是

a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("", inplace=True)
a_n6

给我:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 7 columns):
NPI           1103  non-null values
PIN           1542  non-null values
PROV FIRST    1541  non-null values
PROV LAST     1542  non-null values
PROV MID      1316  non-null values
SPEC NM       1541  non-null values
flag          439  non-null values
dtypes: float64(2), int64(1), object(4)

只有一排,但仍然令人沮丧.我在做什么错了?

It's just one row, but still frustrating. What am I doing wrong?

推荐答案

使用dict作为fillna()

value自变量

正如@rhkarls在@Jeff的答案中的评论中所述,使用索引到列列表的.loc将不支持inplace操作,这也让我感到沮丧.这是一种解决方法.

Use a dict as the value argument to fillna()

As mentioned in the comment by @rhkarls on @Jeff's answer, using .loc indexed to a list of columns won't support inplace operations, which I too find frustrating. Here's a workaround.

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3,4,np.nan],
                   'b':[6,7,8,np.nan,np.nan],
                   'x':[11,12,13,np.nan,np.nan],
                   'y':[16,np.nan,np.nan,19,np.nan]})
print(df)
#     a    b     x     y
#0  1.0  6.0  11.0  16.0
#1  2.0  7.0  12.0   NaN
#2  3.0  8.0  13.0   NaN
#3  4.0  NaN   NaN  19.0
#4  NaN  NaN   NaN   NaN

假设我们只想fillna仅用于xy ab.

Let's say we want to fillna for x and y only, not a and b.

我希望使用.loc可以正常工作(就像在作业中一样),但是不能使用,如前所述:

I would expect using .loc to work (as in an assignment), but it doesn't, as mentioned earlier:

# doesn't work
df.loc[:,['x','y']].fillna(0, inplace=True)
print(df) # nothing changed

但是,文档表示, fillna()value参数可以是:

However, the documentation says that the value argument to fillna() can be:

或者是值的dict/Series/DataFrame,它指定每个索引(对于Series)或列(对于DataFrame)使用哪个值. (不在dict/Series/DataFrame中的值将被填充).

alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled).

事实证明,使用值的字典将起作用:

It turns out that using a dict of values will work:

# works
df.fillna({'x':0, 'y':0}, inplace=True)
print(df)
#     a    b     x     y
#0  1.0  6.0  11.0  16.0
#1  2.0  7.0  12.0   0.0
#2  3.0  8.0  13.0   0.0
#3  4.0  NaN   0.0  19.0
#4  NaN  NaN   0.0   0.0

此外,如果子集中有很多列,则可以使用dict理解,如:

Also, if you have a lot of columns in your subset, you could use a dict comprehension, as in:

df.fillna({x:0 for x in ['x','y']}, inplace=True) # also works

这篇关于 pandas 不会到位fillna()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆