如何解决类型不匹配问题(预期:双,实际:股) [英] How to solve Type mismatch issue (expected: Double, actual: Unit)

查看:308
本文介绍了如何解决类型不匹配问题(预期:双,实际:股)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我的函数,计算均方根误差。然而,最后一行不能因为错误类型不匹配问题,编译(预期:双,实际:单位)。我尝试过许多不同的方法来解决这个问题,但还是没有成功。任何想法?

 高清calculateRMSE(输出:DSTREAM [(双人间,双人间)]):双= {
        VAL summse = {output.foreachRDD RDD = GT;
          rdd.map {
              案例一:(双人间,双人间)=>
                VAL ERR = math.abs(pair._1 - pair._2);
                ERR *犯错
          }。降低(_ + _)
        }
        //的Math.sqrt(summse)如何适用于此SQRT?
  }


解决方案

由于eliasah指出,的foreach (和 foreachRDD )不返回值;它们仅用于副作用。如果你想回来的东西,你需要地图。根据关你的第二个解决方案:

  VAL RMSE = output.map(RDD = gt;新建RegressionMetrics(RDD).rootMeanSquaredError)

看起来更好,如果你犯了一个小功能吧:

  VAL getRmse =(RDD:RDD)=>新RegressionMetrics(RDD).rootMeanSquaredErrorVAL RMSE = output.map(getRmse)

忽略空RDDS,

  VAL RMSE = output.filter(_。非空).MAP(getRmse)

下面是完全相同的序列作为用于泡沫prehension。这只是在地图,flatMap和过滤器语法糖,但我认为这是很容易理解,当我第一次学习斯卡拉:

  VAL RMSE = {为
  RDD< - 输出
  如果(rdd.nonEmpty)
}产生新的RegressionMetrics(RDD).rootMeanSquaredError

和这里的总结错误的函数,就像你第一次尝试:

 高清calculateRmse(输出:DSTREAM [(双人间,双人间)]):双= {VAL getRmse =(RDD:RDD)=>新RegressionMetrics(RDD).rootMeanSquaredErroroutput.filter(_。非空).MAP(getRmse)。降低(_ + _)
}

编译器的约非空投诉实际上是DSTREAM的过滤器方法的问题。而不是在RDDS在DSTREAM经营,过滤器在对双打运行(双人间,双人间)给出您DSTREAM的类型参数。

我不知道足够的火花,说这是一个的漏洞的,但它是很奇怪的。 过滤器和对集合其他大多数操作是典型的中的foreach 定义的,但DSTREAM实现没有按照相同的约定的那些功能;其德precated方法的foreach 和电流 foreachRDD 都工作在流的RDDS,而是<一个href=\"https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.DStream\"相对=nofollow>它的其他高阶方法不的。

所以,我的方法是行不通的。 DSTREAM可能有一个很好的理由是怪异的(性能相关的?)这可能是糟糕的方式做到这一点的的foreach

 高清calculateRmse(DS:DSTREAM [(双人间,双人间)]):双= {  VAR totalError:双= 0  高清getRmse(RDD:RDD [(双人间,双人间)]):双=新RegressionMetrics(RDD).rootMeanSquaredError  ds.foreachRDD((RDD:RDD [(双人间,双人间))=&GT;如果(rdd.isEmpty)totalError + = getRmse(RDD))  totalError
}

但它的作品!

Here is my function that calculates root mean squared error. However the last line cannot be compiled because of the error Type mismatch issue (expected: Double, actual: Unit). I tried many different ways to solve this issue, but still without success. Any ideas?

  def calculateRMSE(output: DStream[(Double, Double)]): Double = {
        val summse = output.foreachRDD { rdd =>
          rdd.map {
              case pair: (Double, Double) =>
                val err = math.abs(pair._1 - pair._2);
                err*err
          }.reduce(_ + _)
        }
        // math.sqrt(summse)  HOW TO APPLY SQRT HERE?
  }

解决方案

As eliasah pointed out, foreach (and foreachRDD) don't return a value; they are for side-effects only. If you wanted to return something, you need map. Based off your second solution:

val rmse = output.map(rdd => new RegressionMetrics(rdd).rootMeanSquaredError)

It looks better if you make a little function for it:

val getRmse = (rdd: RDD) => new RegressionMetrics(rdd).rootMeanSquaredError

val rmse = output.map(getRmse)

Ignoring empty RDDs,

val rmse = output.filter(_.nonEmpty).map(getRmse)

Here is the exact same sequence as a for-comprehension. It's just syntactic sugar for map, flatMap and filter, but I thought it was much easier to understand when I was first learning Scala:

val rmse = for {
  rdd <- output
  if (rdd.nonEmpty)
} yield new RegressionMetrics(rdd).rootMeanSquaredError

And here's a function summing the errors, like your first attempt:

def calculateRmse(output: DStream[(Double, Double)]): Double = {

val getRmse = (rdd: RDD) => new RegressionMetrics(rdd).rootMeanSquaredError

output.filter(_.nonEmpty).map(getRmse).reduce(_+_)
}

The compiler's complaint about nonEmpty is actually an issue with DStream's filter method. Instead of operating on the RDDs in the DStream, filter is operating on the pairs of doubles (Double, Double) given by your DStream's type parameter.

I don't know enough about Spark to say it's a flaw, but it is very strange. Filter and most other operations over collections are typically defined in terms of foreach, but DStream implements those functions without following the same convention; its deprecated method foreach and current foreachRDD both operate over the stream's RDDs, but its other higher-order methods don't.

So my method won't work. DStream probably has a good reason for being weird (performance related?) Here's probably bad way to do it with foreach:

def calculateRmse(ds: DStream[(Double, Double)]): Double = {

  var totalError: Double = 0

  def getRmse(rdd:RDD[(Double, Double)]): Double = new RegressionMetrics(rdd).rootMeanSquaredError

  ds.foreachRDD((rdd:RDD[(Double, Double)]) => if (!rdd.isEmpty) totalError += getRmse(rdd))

  totalError
}

But it works!

这篇关于如何解决类型不匹配问题(预期:双,实际:股)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆