使用多项式朴素贝叶斯分类器时发生ValueError [英] ValueError when using Multinomial Naive Bayes classifier

查看:109
本文介绍了使用多项式朴素贝叶斯分类器时发生ValueError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一次使用Scikit,如果问题很愚蠢,我深表歉意.我正在尝试在UCI的蘑菇数据集上实现朴素的贝叶斯分类器,以针对我自己从头开始编码的NB分类器测试结果.

This is my first time using Scikit, and apologies if the question is stupid. I'm trying to implement a naive bayes classifier on UCI's mushroom dataset to test the results against my own NB classifier coded from scratch.

数据集是分类的,每个特征都有两个以上的可能属性,因此我使用了多项式NB而不是高斯或Bernouilli NB.

The dataset is categorical and each feature has more than 2 possible attributes so I used a multinomial NB instead of a Gaussian or Bernouilli NB.

但是,我不断收到以下错误ValueError: could not convert string to float: 'l',并且不确定该怎么办.多项式NB是否应该能够获取字符串数据?

However, I keep getting the following error ValueError: could not convert string to float: 'l' , and am not sure what to do. Shouldn't a multinomial NB be able to take string data?

Example line of data - 0th column is the class (p for poisonous and e for edible) and the remaining 22 columns are the features.
p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u

# based off UCI's mushroom dataset http://archive.ics.uci.edu/ml/datasets/Mushroom

df = pd.DataFrame(data)
msk = np.random.rand(df.shape[0]) <= training_percent
train = data[msk]
test =  data[~msk] 

clf = MultinomialNB()
clf.fit(train.iloc[:, 1:], train.iloc[:, 0])

推荐答案

简而言之,不应该不能将字符串作为输入.您必须进行一些预处理,但是幸运的是sklearn确实也很适合.

In short, no it shouldn't be able to take a string as an input. You will have to do some preprocessing, but luckily sklearn is really good for that too.

from sklearn import preprocessing
enc = preprocessing.LabelEncoder()
mushrooms = ['p','x','s','n','t','p','f','c','n','k','e','e','s','s','w','w','p','w','o']
enc.fit(mushrooms)
classes = enc.transform(mushrooms)
print classes
print enc.inverse_transform(classes)

哪个输出

[ 6 10  7  4  8  6  2  0  4  3  1  1  7  7  9  9  6  9  5]
['p' 'x' 's' 'n' 't' 'p' 'f' 'c' 'n' 'k' 'e' 'e' 's' 's' 'w' 'w' 'p' 'w''o']

然后训练转换后的数据

clf.fit(enc.tranform(train.iloc[:, 1:], train.iloc[:, 0]))

记住: LabelEncoder仅会转换经过训练的字符串,因此请确保正确预处理数据.

Remember: The LabelEncoder will only transform strings it has been trained on, so ensure you preprocess your data properly.

这篇关于使用多项式朴素贝叶斯分类器时发生ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆