numpy的选择返回布尔错误信息 [英] Numpy select returning boolean error message
问题描述
我想在路径中找到匹配的字符串,并使用np.select创建一个新列,其中的标签取决于我找到的匹配项.
I would like to find matching strings in a path and use np.select to create a new column with labels dependant on the matches I found.
这是我写的
import numpy as np
conditions = [a["properties_path"].str.contains('blog'),
a["properties_path"].str.contains('credit-card-readers/|machines|poss|team|transaction_fees'),
a["properties_path"].str.contains('signup|sign-up|create-account|continue|checkout'),
a["properties_path"].str.contains('complete'),
a["properties_path"] == '/za/|/',
a["properties_path"].str.contains('promo')]
choices = [ "blog","info_pages","signup","completed","home_page","promo"]
a["page_type"] = np.select(conditions, choices, default=np.nan)
但是,当我运行此代码时,会收到以下错误消息:
However, when I run this code, I get this error message:
ValueError:condlist中的无效条目0:应为布尔ndarray
ValueError: invalid entry 0 in condlist: should be boolean ndarray
这是我的数据样本
3124465 /blog/ts-st...
3124466 /card-machines
3124467 /card-machines
3124468 /card-machines
3124469 /promo/our-gift-to-you
3124470 /create-account/v1
3124471 /za/signup/
3124472 /create-account/v1
3124473 /sign-up
3124474 /za/
3124475 /sign-up/cart
3124476 /checkout/
3124477 /complete
3124478 /card-machines
3124479 /continue
3124480 /blog/article/get-car...
3124481 /blog/article/get-car...
3124482 /za/signup/
3124483 /credit-card-readers
3124484 /signup
3124485 /credit-card-readers
3124486 /create-account/v1
3124487 /credit-card-readers
3124488 /point-of-sale-app
3124489 /create-account/v1
3124490 /point-of-sale-app
3124491 /credit-card-readers
推荐答案
.str
方法对对象列进行操作.在这样的列中可能有非字符串值,结果pandas
对于这些行而不是False
返回NaN
. np
然后抱怨,因为这不是布尔值.
The .str
methods operate on object columns. It's possible to have non-string values in such columns, and as a result pandas
returns NaN
for these rows instead of False
. np
then complains because this is not a Boolean.
幸运的是,有一个参数可以解决这个问题:na=False
Luckily, there's an argument to handle this: na=False
a["properties_path"].str.contains('blog', na=False)
或者,您可以将条件更改为:
Alternatively, you could change your conditions to:
a["properties_path"].str.contains('blog') == True
#or
a["properties_path"].str.contains('blog').fillna(False)
样本
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, 'foo', 'bar']})
conds = df.a.str.contains('f')
#0 NaN
#1 True
#2 False
#Name: a, dtype: object
np.select([conds], ['XX'])
#ValueError: invalid entry 0 in condlist: should be boolean ndarray
conds = df.a.str.contains('f', na=False)
#0 False
#1 True
#2 False
#Name: a, dtype: bool
np.select([conds], ['XX'])
#array(['0', 'XX', '0'], dtype='<U11')
这篇关于numpy的选择返回布尔错误信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!