Golearn模型对自变量(预测变量)和目标(预测变量)是隐式的 [英] Golearn models are implicit about independent variables (predictors) and targets (predicted)

查看:58
本文介绍了Golearn模型对自变量(预测变量)和目标(预测变量)是隐式的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Go中学习ML.我正在Go中探索Golearn包,以获得ML支持.我对model.fit和model.predict函数的实现方式非常困惑.

I am learning ML in Go. I was exploring Golearn Package in Go, for ML support. I am very confused with the way the model.fit and model.predict functions are implemented.

例如,在此示例实现中来自 Golearn 回购的Knn分类器:

For example in this example implementation of Knn Classifier from Golearn repo:

    rawData, err := base.ParseCSVToInstances("../datasets/iris_headers.csv", true)
    
    cls := knn.NewKnnClassifier("euclidean", "linear", 2)

    trainData, testData := base.InstancesTrainTestSplit(rawData, 0.50)
    cls.Fit(trainData)

    predictions, err := cls.Predict(testData)

我对模型的x和y感到困惑.如何有选择地传递预测变量并进行预测?关于互联网的文章几乎一无所获,对此我一无所知.

I am confused which are x and y for model. How do I selectively pass in the predictors and predicted? I have almost got frozen with the internet articles giving no clues about it.

我是Golang ML开发人员的新手.有过使用Web和数据库工作的经验.我用python编写ML模型.最近,我发现GO在数据处理方面更快,并且比python更快,更适合ML应用.我渴望对此进行解释.如果不是这样,那么具有较低复杂性但足够的ML支持的Go库也可以.

I am new to Golang ML dev. Had prev experience with web and database work in go. I code ML models in python. Recently I found GO is faster in data processing, and suited for ML application while faster than python. I am eager about an explanation of this. If not, a Go library with less complex but sufficient ML support will also do.

推荐答案

golearn -> knn 实现k个最近邻居算法.它是由

golearn ->knn implements k nearest neighbor algorithm. It is implemented by

  • 将csv文件解析为矩阵

  • parsing a csv file into a matrix

(预测函数)使用不同算法计算向量之间的距离

(Predict function) calculating distance between vectors using different algorithms

  • 欧几里得
  • 曼哈顿
  • 余弦

执行此步骤时,所有非数字字段都将被删除.将非数值字段假定为该模型正在为其训练的标签.

while doing this step all non numerical fields are removed. The non numerical field is assumed as label for which this model is training.

Categories/Labels或 Attributes ,在预测列表中返回,其形式为(index,predicted Attribute)形式的一对值).

Categories/Labels or Attributes defined in csv, are returned in prediction list, a pair of values of the form (index,predicted Attribute).

如何有选择地传递预测变量和预测变量

How do I selectively pass in the predictors and predicted

通过 knn 中的

,您可以通过将csv中的预测目标标记为非整数值来做到这一点.例如( Iris-setosa Iris-versicolor ).

线性回归

您可以使用 AddClassAttribute(),此方法在 DenseInstances 结构上定义,该结构是 base.ParseCSVToInstances()方法的输出.

you can use AddClassAttribute(), this method is defined on DenseInstances struct which is the output of base.ParseCSVToInstances() method.

执行该操作的代码如下

   instances, err := base.ParseCSVToInstances("../examples/datasets/exams.csv", true) // true: means first line of csv is headers.
   
   attrArray:=instances.AllAttributes() 
   instances.SetClassAttribute(attrArray[4])//setting final column as class attribute, note that there cannot be more than one class attribute for linear regression.
   trainData, testData := base.InstancesTrainTestSplit(instances, 0.1) 
   lr := NewLinearRegression()
   err := lr.Fit(instances)
   if err!=nil{
      // error handling
   }
   predictions, err := lr.Predict(testData)
   if err!=nil{
      // error handling
   }

腔室:->在线性回归给出的测试文件中,所有这些都没有完成.我不会说上述方法是分配回归目标的正确方法或最佳方法.

caveat:-> in the test file given with linear regression all these are not done. I would not claim that the above method is the correct way or the optimal way of assigning the regression target.

这是一种可能的方法.它使线性回归的 Fit()函数成为候选函数,而该函数正是在该函数中进行模型计算的. Predict()函数仅将线性回归系数的有限集相乘,并将该值存储为预测值.

It is a possible way. It makes a candidate for Fit() function of linear regression which is where the computations for this model takes place. Predict() function merely multiplies the finite set of linear regression coefficients and stores that value as the prediction.

这篇关于Golearn模型对自变量(预测变量)和目标(预测变量)是隐式的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆