ml.net 关于格式错误的情感分析警告&错误的价值观 [英] ml.net sentiment analysis warning about format errors & bad values
问题描述
我的 ml.net 控制台应用程序出现问题.这是我第一次在 Visual Studio 中使用 ml.net,所以我关注了
这是我的测试数据的链接:
I've been having a problem with my ml.net console app. This is my first time using ml.net in Visual Studio so I was following this tutorial from microsoft.com, which is a sentiment analysis using binary classification.
I'm trying to process some test data in the form of tsv files to get a positive or negative sentiment analysis, but in debugging I'm receiving warnings there being 1 format error and 2 bad values.
I decided to ask all you great devs here on Stack to see if anyone can help me find a solution.
Here's an image of the debugging below:
Here's the link to my test data:
wiki-data
wiki-test-data
Finally, here's my code for those who what to reproduce the problem:
There's 2 c# files: SentimentData.cs & Program.cs.
1 - SentimentData.cs:
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.ML.Runtime.Api;
namespace MachineLearningTut
{
public class SentimentData
{
[Column(ordinal: "0")]
public string SentimentText;
[Column(ordinal: "1", name: "Label")]
public float Sentiment;
}
public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public bool Sentiment;
}
}
2 - Program.cs:
using System;
using Microsoft.ML.Models;
using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Threading.Tasks;
namespace MachineLearningTut
{
class Program
{
const string _dataPath = @".\Data\wikipedia-detox-250-line-data.tsv";
const string _testDataPath = @".\Data\wikipedia-detox-250-line-test.tsv";
const string _modelpath = @".\Data\Model.zip";
static async Task Main(string[] args)
{
var model = await TrainAsync();
Evaluate(model);
Predict(model);
}
public static async Task<PredictionModel<SentimentData, SentimentPrediction>> TrainAsync()
{
var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader (_dataPath).CreateFrom<SentimentData>());
pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
pipeline.Add(new FastForestBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });
PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();
await model.WriteAsync(path: _modelpath);
return model;
}
public static void Evaluate(PredictionModel<SentimentData, SentimentPrediction> model)
{
var testData = new TextLoader(_testDataPath).CreateFrom<SentimentData>();
var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);
Console.WriteLine();
Console.WriteLine("PredictionModel quality metrics evaluation");
Console.WriteLine("-------------------------------------");
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
Console.WriteLine($"Auc: {metrics.Auc:P2}");
Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
}
public static void Predict(PredictionModel<SentimentData, SentimentPrediction> model)
{
IEnumerable<SentimentData> sentiments = new[]
{
new SentimentData
{
SentimentText = "Please refrain from adding nonsense to Wikipedia."
},
new SentimentData
{
SentimentText = "He is the best, and the article should say that."
}
};
IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments);
Console.WriteLine();
Console.WriteLine("Sentiment Predictions");
Console.WriteLine("---------------------");
var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));
foreach (var item in sentimentsAndPredictions)
{
Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");
}
Console.WriteLine();
}
}
}
If anyone would like to see the code or more details on the solution, ask me on the chat and I'll send it. Thanks in advance!!! [Throws a Thumbs Up]
I think I got a fix for you. A couple of things to update:
First, I think you got your SentimentData
properties switched to what the data has. Try changing it to
[Column(ordinal: "0", name: "Label")]
public float Sentiment;
[Column(ordinal: "1")]
public string SentimentText;
Second, use the useHeader
parameter in the TextLoader.CreateFrom
method. Don't forget to add that to the other one for the validation data, as well.
pipeline.Add(new TextLoader(_dataPath).CreateFrom<SentimentData>(useHeader: true));
With those two updates, I got the below output. Looks like a nice model with an AUC of 85%!
这篇关于ml.net 关于格式错误的情感分析警告&错误的价值观的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!