ml.net 关于格式错误的情感分析警告&错误的价值观 [英] ml.net sentiment analysis warning about format errors & bad values

查看:17
本文介绍了ml.net 关于格式错误的情感分析警告&错误的价值观的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 ml.net 控制台应用程序出现问题.这是我第一次在 Visual Studio 中使用 ml.net,所以我关注了

这是我的测试数据的链接:

I've been having a problem with my ml.net console app. This is my first time using ml.net in Visual Studio so I was following this tutorial from microsoft.com, which is a sentiment analysis using binary classification.

I'm trying to process some test data in the form of tsv files to get a positive or negative sentiment analysis, but in debugging I'm receiving warnings there being 1 format error and 2 bad values.

I decided to ask all you great devs here on Stack to see if anyone can help me find a solution.

Here's an image of the debugging below:

Here's the link to my test data:
wiki-data
wiki-test-data

Finally, here's my code for those who what to reproduce the problem:

There's 2 c# files: SentimentData.cs & Program.cs.

1 - SentimentData.cs:

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.ML.Runtime.Api;

namespace MachineLearningTut
{
 public class SentimentData
 {
    [Column(ordinal: "0")]
    public string SentimentText;
    [Column(ordinal: "1", name: "Label")]
    public float Sentiment;
 }

 public class SentimentPrediction
 {
    [ColumnName("PredictedLabel")]
    public bool Sentiment;
 }
}

2 - Program.cs:

using System;
using Microsoft.ML.Models;
using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Threading.Tasks;

namespace MachineLearningTut
{
class Program
{
    const string _dataPath = @".\Data\wikipedia-detox-250-line-data.tsv";
    const string _testDataPath = @".\Data\wikipedia-detox-250-line-test.tsv";
    const string _modelpath = @".\Data\Model.zip";

    static async Task Main(string[] args)
    {
        var model = await TrainAsync();

        Evaluate(model);

        Predict(model);
    }

    public static async Task<PredictionModel<SentimentData, SentimentPrediction>> TrainAsync()
    {
        var pipeline = new LearningPipeline();

        pipeline.Add(new TextLoader (_dataPath).CreateFrom<SentimentData>());

        pipeline.Add(new TextFeaturizer("Features", "SentimentText"));

        pipeline.Add(new FastForestBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

        PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();

        await model.WriteAsync(path: _modelpath);

        return model;
    }

    public static void Evaluate(PredictionModel<SentimentData, SentimentPrediction> model)
    {
        var testData = new TextLoader(_testDataPath).CreateFrom<SentimentData>();

        var evaluator = new BinaryClassificationEvaluator();

        BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);

        Console.WriteLine();
        Console.WriteLine("PredictionModel quality metrics evaluation");
        Console.WriteLine("-------------------------------------");
        Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
        Console.WriteLine($"Auc: {metrics.Auc:P2}");
        Console.WriteLine($"F1Score: {metrics.F1Score:P2}");

    }

    public static void Predict(PredictionModel<SentimentData, SentimentPrediction> model)
    {
        IEnumerable<SentimentData> sentiments = new[]
        {
            new SentimentData
            {
                SentimentText = "Please refrain from adding nonsense to Wikipedia."
            },

            new SentimentData
            {
                SentimentText = "He is the best, and the article should say that."
            }
        };

        IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments);

        Console.WriteLine();
        Console.WriteLine("Sentiment Predictions");
        Console.WriteLine("---------------------");

        var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));

        foreach (var item in sentimentsAndPredictions)
        {
            Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");
        }
        Console.WriteLine();
    }
}

}

If anyone would like to see the code or more details on the solution, ask me on the chat and I'll send it. Thanks in advance!!! [Throws a Thumbs Up]

解决方案

I think I got a fix for you. A couple of things to update:

First, I think you got your SentimentData properties switched to what the data has. Try changing it to

[Column(ordinal: "0", name: "Label")]
public float Sentiment;

[Column(ordinal: "1")]
public string SentimentText;

Second, use the useHeader parameter in the TextLoader.CreateFrom method. Don't forget to add that to the other one for the validation data, as well.

pipeline.Add(new TextLoader(_dataPath).CreateFrom<SentimentData>(useHeader: true));

With those two updates, I got the below output. Looks like a nice model with an AUC of 85%!

这篇关于ml.net 关于格式错误的情感分析警告&amp;错误的价值观的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆