用 pandas 数据框中的空列表替换NaN [英] Replace NaN with empty list in a pandas dataframe
问题描述
我正在尝试用空列表[]替换数据中的某些NaN值.但是,该列表表示为str,并且不允许我正确应用len()函数.反正有没有用熊猫中的实际空列表替换NaN值?
I'm trying to replace some NaN values in my data with an empty list []. However the list is represented as a str and doesn't allow me to properly apply the len() function. is there anyway to replace a NaN value with an actual empty list in pandas?
In [28]: d = pd.DataFrame({'x' : [[1,2,3], [1,2], np.NaN, np.NaN], 'y' : [1,2,3,4]})
In [29]: d
Out[29]:
x y
0 [1, 2, 3] 1
1 [1, 2] 2
2 NaN 3
3 NaN 4
In [32]: d.x.replace(np.NaN, '[]', inplace=True)
In [33]: d
Out[33]:
x y
0 [1, 2, 3] 1
1 [1, 2] 2
2 [] 3
3 [] 4
In [34]: d.x.apply(len)
Out[34]:
0 3
1 2
2 2
3 2
Name: x, dtype: int64
推荐答案
这可以使用isnull
和loc
来屏蔽系列:
This works using isnull
and loc
to mask the series:
In [90]:
d.loc[d.isnull()] = d.loc[d.isnull()].apply(lambda x: [])
d
Out[90]:
0 [1, 2, 3]
1 [1, 2]
2 []
3 []
dtype: object
In [91]:
d.apply(len)
Out[91]:
0 3
1 2
2 0
3 0
dtype: int64
您必须使用apply
来执行此操作,以使列表对象不被解释为要分配回df的数组,该数组将尝试将形状调整回原始系列
You have to do this using apply
in order for the list object to not be interpreted as an array to assign back to the df which will try to align the shape back to the original series
编辑
使用更新后的示例,可以进行以下工作:
Using your updated sample the following works:
In [100]:
d.loc[d['x'].isnull(),['x']] = d.loc[d['x'].isnull(),'x'].apply(lambda x: [])
d
Out[100]:
x y
0 [1, 2, 3] 1
1 [1, 2] 2
2 [] 3
3 [] 4
In [102]:
d['x'].apply(len)
Out[102]:
0 3
1 2
2 0
3 0
Name: x, dtype: int64
这篇关于用 pandas 数据框中的空列表替换NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!