如何将数据帧转换为标签特征向量? [英] How to transform the dataframe into label feature vector?

查看:93
本文介绍了如何将数据帧转换为标签特征向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Scala中运行Logistic回归模型,并且具有如下数据框:

I am running a logistic regression modl in scala and I have a data frame like below:

df

+-----------+------------+
|x          |y           |
+-----------+------------+
|          0|           0|
|          0|          33|
|          0|          58|
|          0|          96|
|          0|           1|
|          1|          21|
|          0|          10|
|          0|          65|
|          1|           7|
|          1|          28|
+-----------+------------+

我需要将其转换为类似的内容

I need to tranform this into something like this

+-----+------------------+
|label|      features    | 
+-----+------------------+
|  0.0|(1,[1],[0])       |
|  0.0|(1,[1],[33])      |
|  0.0|(1,[1],[58])      |
|  0.0|(1,[1],[96])      |
|  0.0|(1,[1],[1])       |
|  1.0|(1,[1],[21])      |
|  0.0|(1,[1],[10])      |
|  0.0|(1,[1],[65])      |
|  1.0|(1,[1],[7])       |
|  1.0|(1,[1],[28])      | 
+-----------+------------+

我尝试了

 val lr = new LogisticRegression()
           .setMaxIter(10)
           .setRegParam(0.3)
           .setElasticNetParam(0.8)

      val assembler = new VectorAssembler()
  .setInputCols(Array("x"))
  .setOutputCol("Feature")
  var lrModel=  lr.fit(daf.withColumnRenamed("x","label").withColumnRenamed("y","features"))

感谢您的帮助.

推荐答案

给出dataframe

+---+---+
|x  |y  |
+---+---+
|0  |0  |
|0  |33 |
|0  |58 |
|0  |96 |
|0  |1  |
|1  |21 |
|0  |10 |
|0  |65 |
|1  |7  |
|1  |28 |
+---+---+

并执行以下操作

val assembler =  new VectorAssembler()
  .setInputCols(Array("x", "y"))
  .setOutputCol("features")

  val output = assembler.transform(df).select($"x".cast(DoubleType).as("label"), $"features")
output.show(false)

将为您提供结果

+-----+----------+
|label|features  |
+-----+----------+
|0.0  |(2,[],[]) |
|0.0  |[0.0,33.0]|
|0.0  |[0.0,58.0]|
|0.0  |[0.0,96.0]|
|0.0  |[0.0,1.0] |
|1.0  |[1.0,21.0]|
|0.0  |[0.0,10.0]|
|0.0  |[0.0,65.0]|
|1.0  |[1.0,7.0] |
|1.0  |[1.0,28.0]|
+-----+----------+

现在使用LogisticRegression会很容易

val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)

val lrModel = lr.fit(output)
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

您将输出为

Coefficients: [1.5672602877378823,0.0] Intercept: -1.4055020984891717

这篇关于如何将数据帧转换为标签特征向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆