如果列值不为NULL,则Python pandas套用功能 [英] Python pandas apply function if a column value is not NULL

查看:374
本文介绍了如果列值不为NULL,则Python pandas套用功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框(在python 2.7中为pandas 0.15.0):

I have a dataframe (in Python 2.7, pandas 0.15.0):

df=
       A    B               C
0    NaN   11             NaN
1    two  NaN  ['foo', 'bar']
2  three   33             NaN

我想对在特定列中不包含NULL值的行应用一个简单的函数.我的功能尽可能简单:

I want to apply a simple function for rows that does not contain NULL values in a specific column. My function is as simple as possible:

def my_func(row):
    print row

我的申请代码如下:

df[['A','B']].apply(lambda x: my_func(x) if(pd.notnull(x[0])) else x, axis = 1)

它完美地工作.如果我想检查列"B"中是否有NULL值,则pd.notnull()也可以很好地工作.但是,如果我选择包含列表对象的列"C":

It works perfectly. If I want to check column 'B' for NULL values the pd.notnull() works perfectly as well. But if I select column 'C' that contains list objects:

df[['A','C']].apply(lambda x: my_func(x) if(pd.notnull(x[1])) else x, axis = 1)

然后我收到以下错误消息:ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')

then I get the following error message: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')

有人知道为什么pd.notnull()仅适用于整数和字符串列而不适用于列表列"吗?

Does anybody know why pd.notnull() works only for integer and string columns but not for 'list columns'?

还有一种更好的方法来检查"C"列中的NULL值,而不是这样:

And is there a nicer way to check for NULL values in column 'C' instead of this:

df[['A','C']].apply(lambda x: my_func(x) if(str(x[1]) != 'nan') else x, axis = 1)

谢谢!

推荐答案

问题是pd.notnull(['foo', 'bar'])按元素进行操作并返回array([ True, True], dtype=bool).您的if条件会尝试将其转换为布尔值,那就是您收到异常的时候.

The problem is that pd.notnull(['foo', 'bar']) operates elementwise and returns array([ True, True], dtype=bool). Your if condition trys to convert that to a boolean, and that's when you get the exception.

要解决此问题,您可以简单地将包裹在isull语句中:

To fix it, you could simply wrap the isnull statement with np.all:

df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)

现在您会看到np.all(pd.notnull(['foo', 'bar']))确实是True.

这篇关于如果列值不为NULL,则Python pandas套用功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆