如何在scikit中使用朴素贝叶斯分类器和NLTK的k倍交叉验证 [英] How to use the a k-fold cross validation in scikit with naive bayes classifier and NLTK

查看:148
本文介绍了如何在scikit中使用朴素贝叶斯分类器和NLTK的k倍交叉验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的语料库很小,我想使用10倍交叉验证来计算朴素贝叶斯分类器的准确性,该怎么做.

I have a small corpus and I want to calculate the accuracy of naive Bayes classifier using 10-fold cross validation, how can do it.

推荐答案

您的选择是自己设置还是使用类似 NLTK训练师,因为NLTK 不直接支持针对以下内容的交叉验证机器学习算法.

Your options are to either set this up yourself or use something like NLTK-Trainer since NLTK doesn't directly support cross-validation for machine learning algorithms.

我可能建议您仅使用另一个模块为您执行此操作,但是如果您确实想编写自己的代码,则可以执行以下操作.

I'd recommend probably just using another module to do this for you but if you really want to write your own code you could do something like the following.

假设要 10倍,则必须将训练集划分为10个子集,在9/10上进行训练,在其余的1/10上进行测试,然后对每个子集(10)的组合.

Supposing you want 10-fold, you would have to partition your training set into 10 subsets, train on 9/10, test on the remaining 1/10, and do this for each combination of subsets (10).

假设您的训练集位于名为training的列表中,那么完成此操作的一种简单方法是

Assuming your training set is in a list named training, a simple way to accomplish this would be,

num_folds = 10
subset_size = len(training)/num_folds
for i in range(num_folds):
    testing_this_round = training[i*subset_size:][:subset_size]
    training_this_round = training[:i*subset_size] + training[(i+1)*subset_size:]
    # train using training_this_round
    # evaluate against testing_this_round
    # save accuracy

# find mean accuracy over all rounds

这篇关于如何在scikit中使用朴素贝叶斯分类器和NLTK的k倍交叉验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆