朴素贝叶斯-没有类别标签1的样本 [英] Naive Bayes - no samples for class label 1

查看:115
本文介绍了朴素贝叶斯-没有类别标签1的样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Accord.net.我已经成功实现了两个决策树算法ID3和C4.5,现在我正在尝试实现朴素Bays算法.虽然该网站上有很多示例代码,但其中大多数似乎已过时,或存在各种问题.

I am using accord.net. I have successfully implemented the two Decision tree algorithms ID3 and C4.5, now I am trying to implement the Naive Bays algorithm. While there is a lot of sample code on the site, most of it seems to be out of date, or have various issues.

到目前为止,我在网站上找到的最佳示例代码在这里: http://accord-framework.net/docs/html/T_Accord_MachineLearning_Bayes_NaiveBayes_1.htm

The best sample code I have found on the site so far has been here: http://accord-framework.net/docs/html/T_Accord_MachineLearning_Bayes_NaiveBayes_1.htm

但是,当我尝试对数据运行该代码时,我得到:

However, when I try and run that code against my data I get:

班级标签1没有样品.请确保该班级 标签是连续的,并且至少有一个训练样本用于 每个标签.

There are no samples for class label 1. Please make sure that class labels are contiguous and there is at least one training sample for each label.

从该文件的第228行开始

: https://github.com/accord- net/framework/blob/master/Sources/Accord.MachineLearning/Tools.cs 当我打电话 学习者.学习(输入,输出)在我的代码中.

from line 228 of this file: https://github.com/accord-net/framework/blob/master/Sources/Accord.MachineLearning/Tools.cs when I call learner.learn(inputs, outputs) in my code.

我已经遇到了实现其他两个回归树时遇到的Null错误,并且我的数据已针对该问题进行了消毒.

I have already run into the Null bugs that accord has when implementing the other two regression trees, and my data has been sanitized against that issue.

任何Accord.net专家是否有什么想法会引发此错误?

Does any accord.net expert have an idea what would trigger this error?

我的代码摘录:

    var codebook = new Codification(fulldata, AllAttributeNames);

    /*
     * Get list of all possible combinations
     * Status software blows up if it encounters a value it has not seen before.
     */
    var attributList = new List<IUnivariateFittableDistribution>();
    foreach (var attr in DeciAttributeNames)
    {
        {
            /*
             * By default we'll use a standard static list of values for this column
             */
            var cntLst = codebook[attr].NumberOfSymbols;

            // no decisions can be made off of the variable if it is a constant value
            if (cntLst > 1)
            {
                KeptAttributeNames.Add(attr);
                attributList.Add(new GeneralDiscreteDistribution(cntLst));
            }
        }
    }

    var data = fulldata.Copy(); // this is a datatable

    /*
     * Translate our training data into integer symbols using our codebook
     */
    DataTable symbols = codebook.Apply(data, AllAttributeNames);
    double[][] inputs = symbols.ToJagged<double>(KeptAttributeNames.ToArray());
    int[] outputs = symbols.ToArray<int>(OutAttributeName);
    progBar.PerformStep();

    /*
     * Create a new instance of the learning algorithm
     * and build the algorithm
     */
    var learner = new NaiveBayesLearning<IUnivariateFittableDistribution>()
    {
        // Tell the learner how to initialize the distributions
        Distribution = (classIndex, variableIndex) => attributList[variableIndex]
    };

    var alg = learner.Learn(inputs, outputs);

经过进一步的实验,似乎此错误仅在我处理一定数量的行时发生.如果我处理60行或少于我的行,如果我处理500行或更多,那我就很好.但是在这个范围之间,我抛出了这个错误.根据我选择的数据量,错误消息中的索引号可能会更改,我看到它的范围是0到2.

After further experimentation, it seems as though this error only occurs when I am processing a certain number of rows. If I process 60 rows or less than I am fine, if I process 500 rows or more then I am fine. But in between that range I throw this error. Depending on the amount of data I choose, the index number in the error message can change, I have seen it range from 0 to 2.

所有数据都来自相同的sql server数据源,我唯一要调整的是查询的Select Top ###部分.

All the data is coming from the same sql server datasource, the only thing I am adjusting is the Select Top ### portion of the query.

推荐答案

当您定义了没有任何样本数据的标签时,在多类方案中将收到此错误.如果数据集很小,则随机抽样可能会偶然排除具有给定标签的所有观察结果.

You will receive this error in multi-class scenarios when you have defined a label that does not have any sample data. With a small data set your random sampling may by chance exclude all observations with a given label.

这篇关于朴素贝叶斯-没有类别标签1的样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆