pandas :当数据为NaN时,无法进行逻辑运算 [英] pandas: when data is NaN logic operations cannot be done
问题描述
我在Pandas中有一个很大的DataFrame,并且两列可以具有值,或者当未分配任何值时可以为NaN(空).
I have a large DataFrame in Pandas and 2 columns can have values or be NaN (Null) when not assigned to any value.
我想根据这些2填充第三列.如果不是NaN,则需要一些值.其工作原理如下:
I want to populate a 3rd column based on these 2. When not NaN it takes some value. This works as follows:
In [16]: import pandas as pd
In [17]: import numpy as np
In [18]: df = pd.DataFrame([[np.NaN, np.NaN],['John', 'Malone'],[np.NaN, np.NaN]], columns = ['col1', 'col2'])
In [19]: df
Out[19]:
col1 col2
0 NaN NaN
1 John Malone
2 NaN NaN
In [20]: df['col3'] = np.NaN
In [21]: df.loc[df['col1'].notnull(),'col3'] = 'I am ' + df['col1']
In [22]: df
Out[22]:
col1 col2 col3
0 NaN NaN NaN
1 John Malone I am John
2 NaN NaN NaN
这也有效:
In [29]: df.loc[df['col1']== 'John','col3'] = 'I am ' + df['col2']
In [30]: df
Out[30]:
col1 col2 col3
0 NaN NaN NaN
1 John Malone I am Malone
2 NaN NaN NaN
但是,如果我没有将所有值都设为NaN,然后尝试最后一个位置,则会给我一个错误!
But if I not make all values NaN and then try this last loc, it gives me an error!
In [31]: df = pd.DataFrame([[np.NaN, np.NaN],[np.NaN, np.NaN],[np.NaN, np.NaN]], columns = ['col1', 'col2'])
In [32]: df
Out[32]:
col1 col2
0 NaN NaN
1 NaN NaN
2 NaN NaN
In [33]: df['col3'] = np.NaN
In [34]: df.loc[df['col1']== 'John','col3'] = 'I am ' + df['col2']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
c:\python33\lib\site-packages\pandas\core\ops.py in na_op(x, y)
552 result = expressions.evaluate(op, str_rep, x, y,
--> 553 raise_on_error=True, **eval_kwargs)
554 except TypeError:
c:\python33\lib\site-packages\pandas\computation\expressions.py in evaluate(op, op_str, a, b, raise_on_error, use_numexpr, **eval_kwargs)
217 return _evaluate(op, op_str, a, b, raise_on_error=raise_on_error,
--> 218 **eval_kwargs)
219 return _evaluate_standard(op, op_str, a, b, raise_on_error=raise_on_error)
c:\python33\lib\site-packages\pandas\computation\expressions.py in _evaluate_standard(op, op_str, a, b, raise_on_error, **eval_kwargs)
70 _store_test_result(False)
---> 71 return op(a, b)
72
c:\python33\lib\site-packages\pandas\core\ops.py in _radd_compat(left, right)
805 try:
--> 806 output = radd(left, right)
807 except TypeError:
c:\python33\lib\site-packages\pandas\core\ops.py in <lambda>(x, y)
802 def _radd_compat(left, right):
--> 803 radd = lambda x, y: y + x
804 # GH #353, NumPy 1.5.1 workaround
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-34-3b2873f8749b> in <module>()
----> 1 df.loc[df['col1']== 'John','col3'] = 'I am ' + df['col2']
c:\python33\lib\site-packages\pandas\core\ops.py in wrapper(left, right, name, na_op)
616 lvalues = lvalues.values
617
--> 618 return left._constructor(wrap_results(na_op(lvalues, rvalues)),
619 index=left.index, name=left.name,
620 dtype=dtype)
c:\python33\lib\site-packages\pandas\core\ops.py in na_op(x, y)
561 result = np.empty(len(x), dtype=x.dtype)
562 mask = notnull(x)
--> 563 result[mask] = op(x[mask], y)
564 else:
565 raise TypeError("{typ} cannot perform the operation {op}".format(typ=type(x).__name__,op=str_rep))
c:\python33\lib\site-packages\pandas\core\ops.py in _radd_compat(left, right)
804 # GH #353, NumPy 1.5.1 workaround
805 try:
--> 806 output = radd(left, right)
807 except TypeError:
808 raise
c:\python33\lib\site-packages\pandas\core\ops.py in <lambda>(x, y)
801
802 def _radd_compat(left, right):
--> 803 radd = lambda x, y: y + x
804 # GH #353, NumPy 1.5.1 workaround
805 try:
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
这就像熊猫是否不喜欢Column value ==如果所有值都是NaN的某些文本一样????
It is like if Pandas didn't like the Column value == some text if all values are NaN????
帮助!
推荐答案
我认为,如果有任何不为null的值,那么实际上这行所做的就是在第1列的值中添加一个字符串.
I would argue that really all this line is doing is doing is adding a string to column 1 values if there are any values that are not null.
df.loc[df['col1'].notnull(),'col3'] = 'I am ' + df['col1']
因此,您可以仅检查是否有任何不为null的值,然后仅在存在以下情况时才执行操作:
So you can just check if there are any values that are not null and then only perform the operation if there are:
if df['col1'].notnull().any():
df['col3'] = 'I am ' + df['col1']
您还不需要在以这种方式运行之前创建col3列.
You also don't need to create the col3 column prior to running it this way.
这篇关于 pandas :当数据为NaN时,无法进行逻辑运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!