相当于np.where的 pandas [英] pandas equivalent of np.where
问题描述
np.where
具有向量化if/else的语义(类似于Apache Spark的when
/otherwise
DataFrame方法).我知道我可以在熊猫Series
上使用np.where
,但是pandas
通常定义自己的API来代替原始的numpy
函数使用,通常使用pd.Series
/pd.DataFrame
更为方便.>
果然,我找到了pandas.DataFrame.where
.但是,乍看之下,它具有完全不同的语义.我找不到一种方法来使用熊猫where
重写np.where
的最基本示例:
# df is pd.DataFrame
# how to write this using df.where?
df['C'] = np.where((df['A']<0) | (df['B']>0), df['A']+df['B'], df['A']/df['B'])
我缺少明显的东西吗?还是熊猫where
是为完全不同的用例而设计的,尽管其名称与np.where
相同?
尝试:
(df['A'] + df['B']).where((df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])
numpy
where
和DataFrame
where
之间的区别在于,默认值由调用where
方法的DataFrame
提供(docs ).
即
np.where(m, A, B)
大致等同于
A.where(m, B)
如果您希望使用熊猫进行类似的调用签名,则可以利用方法调用的方式在Python中工作:
pd.DataFrame.where(cond=(df['A'] < 0) | (df['B'] > 0), self=df['A'] + df['B'], other=df['A'] / df['B'])
或不带kwargs(请注意:参数的位置顺序与numpy
where
解决方案
Try:
(df['A'] + df['B']).where((df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])
The difference between the numpy
where
and DataFrame
where
is that the default values are supplied by the DataFrame
that the where
method is being called on (docs).
I.e.
np.where(m, A, B)
is roughly equivalent to
A.where(m, B)
If you wanted a similar call signature using pandas, you could take advantage of the way method calls work in Python:
pd.DataFrame.where(cond=(df['A'] < 0) | (df['B'] > 0), self=df['A'] + df['B'], other=df['A'] / df['B'])
or without kwargs (Note: that the positional order of arguments is different from the numpy
where
argument order):
pd.DataFrame.where(df['A'] + df['B'], (df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])
这篇关于相当于np.where的 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!