Spark-KMeans.train中的IllegalArgumentException [英] Spark - IllegalArgumentException in KMeans.train

查看：87 发布时间：2020/9/4 18:37:03 apache-spark apache-spark-mllib

本文介绍了Spark-KMeans.train中的IllegalArgumentException的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在KMeans.train()内部时遇到异常，如下所示:

java.lang.IllegalArgumentException: requirement failed
  at scala.Predef$.require(Predef.scala:212)
  at org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:487)
  at org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:589)
  at org.apache.spark.mllib.clustering.KMeans$$anonfun$runAlgorithm$3.apply(KMeans.scala:304)
  at org.apache.spark.mllib.clustering.KMeans$$anonfun$runAlgorithm$3.apply(KMeans.scala:301)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  at org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:301)
  at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:227)
  at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:209)
  at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:530)

这没有给我任何从哪里开始调试的线索.
我发现了一条旧的帖子，但是问题出在KMeans.predict()中，而这是在培训阶段本身中发生的.

解决方案

只需看一下源代码，它就会变得很清楚:

您的向量必须具有相同的大小.
两个向量的范数均应为非负数.

https://github.com/apache/spark/blob/17af727e38c3faaeab5b91a8cdab5f2181cf3fc4/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala#L500

private[mllib] def fastSquaredDistance( v1: Vector, norm1: Double, v2: Vector, norm2: Double, precision: Double = 1e-6): Double = { val n = v1.size require(v2.size == n) require(norm1 >= 0.0 && norm2 >= 0.0) ...

I am running into an exception while inside KMeans.train() like below:

java.lang.IllegalArgumentException: requirement failed
  at scala.Predef$.require(Predef.scala:212)
  at org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:487)
  at org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:589)
  at org.apache.spark.mllib.clustering.KMeans$$anonfun$runAlgorithm$3.apply(KMeans.scala:304)
  at org.apache.spark.mllib.clustering.KMeans$$anonfun$runAlgorithm$3.apply(KMeans.scala:301)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  at org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:301)
  at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:227)
  at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:209)
  at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:530)

This doesn't give me any clue on where to start debugging.
I found an old post but that issue was in KMeans.predict() whereas this is happening in the training phase itself.

解决方案

Just take a look at the source code and it will become clear:

Your vectors have to have the same size.
The norms of both vectors should be non-negative.

https://github.com/apache/spark/blob/17af727e38c3faaeab5b91a8cdab5f2181cf3fc4/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala#L500

这篇关于Spark-KMeans.train中的IllegalArgumentException的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark-KMeans.train中的IllegalArgumentException [英] Spark - IllegalArgumentException in KMeans.train

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark-KMeans.train中的IllegalArgumentException [英] Spark - IllegalArgumentException in KMeans.train

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭