Python NLTK代码片段以使用特征频率训练分类器(朴素贝叶斯) [英] Python NLTK code snippet to train a classifier (naive bayes) using feature frequency

查看:131
本文介绍了Python NLTK代码片段以使用特征频率训练分类器(朴素贝叶斯)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人可以通过一个代码片段来帮助我,该代码片段演示了如何使用特征频率方法(而不是特征存在)来训练朴素贝叶斯分类器.

I was wondering if anyone could help me through a code snippet that demonstrates how to train Naive Bayes classifier using a feature frequency method as opposed to feature presence.

我假设第6章中所示为以下内容

I presume the below as shown in Chap 6 link text refers to creating a featureset using Feature Presence (FP) -

def document_features(document): 
    document_words = set(document) 

    features = {}
    for word in word_features:
        features['contains(%s)' % word] = (word in document_words)

    return features

请咨询

推荐答案

要进行培训,请创建可用于创建ProbDists的适当的FreqDists,然后将其传递给NaiveBayesClassifier.但是分类实际上适用于使用布尔值而非频率的特征集.因此,如果要基于FreqDist进行分类,则必须实现自己的分类器,该分类器不使用NLTK功能集.

For training, create appropriate FreqDists that you can use to create ProbDists, than can then be passed in to the NaiveBayesClassifier. But the classification actually works on feature sets, which use boolean values, not frequencies. So if you want to classify based on a FreqDist, you'll have to implement your own classifier, that does not use the NLTK feature sets.

这篇关于Python NLTK代码片段以使用特征频率训练分类器(朴素贝叶斯)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆