使用 NaiveBayes 算法时如何使用 One-hot Encode? [英] How to use One-hot Encode while using NaiveBayes algorithm?

查看:24
本文介绍了使用 NaiveBayes 算法时如何使用 One-hot Encode?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用朴素贝叶斯算法来满足我的一项要求.在这方面,我计划对超平面使用One-hot Encode".我使用以下代码来运行我的算法.但是,我不确定如何使用One-hot Encode".

I'm trying to use Naive Bayes algorithm for one of my requirements. In this, I have planned to use "One-hot Encode" for hyper plane. I have used the following code for running my algorithm. But, I'm not sure how to use "One-hot Encode".

请找到以下代码:

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix

def load_data(filename):

    x = list()
    y = list()
    with open(filename) as file:
        file.readline()
        for line in file:
            line = line.strip().split(',')
            y.append(line[1])
            x.append(line[0].split())

    return x, y

X_train, y_train = load_data('/Users/Desktop/abc/train.csv')
X_test, y_test = load_data('/Users/Desktop/abc/test.csv')

onehot_enc = MultiLabelBinarizer()
onehot_enc.fit(X_train)


bnbc = BernoulliNB(binarize=None)
bnbc.fit(onehot_enc.transform(X_train), y_train)

score = bnbc.score(onehot_enc.transform(X_test), y_test)
print("score of Naive Bayes algo is :" , score)

谁能告诉我上面写的代码是否正确?

Can anyone please suggest me whether the above written code is correct ?

推荐答案

尝试使用 CountVectorizer

from sklearn.feature_extraction.text import CountVectorizer

clf = CountVectorizer()
X_train_one_hot =  clf.fit(X_train)
X_test_one_hot = clf.transform(X_test)

bnbc = BernoulliNB(binarize=None)
bnbc.fit(X_train_one_hot, y_train)

score = bnbc.score(X_test_one_hot, y_test)
print("score of Naive Bayes algo is :" , score)

您也可以尝试使用 TfidfVectorizer 如果您打算使用文本的 TfIdf 特征化.

Also you can try using TfidfVectorizer in case if you are going to use TfIdf featurization of text.

这篇关于使用 NaiveBayes 算法时如何使用 One-hot Encode?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆