Spark-KMeans.train中的IllegalArgumentException [英] Spark - IllegalArgumentException in KMeans.train
问题描述
我在KMeans.train()
内部时遇到异常,如下所示:
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:212)
at org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:487)
at org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:589)
at org.apache.spark.mllib.clustering.KMeans$$anonfun$runAlgorithm$3.apply(KMeans.scala:304)
at org.apache.spark.mllib.clustering.KMeans$$anonfun$runAlgorithm$3.apply(KMeans.scala:301)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:301)
at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:227)
at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:209)
at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:530)
这没有给我任何从哪里开始调试的线索.
我发现了一条旧的帖子,但是问题出在KMeans.predict()
中,而这是在培训阶段本身中发生的.
只需看一下源代码,它就会变得很清楚:
- 您的向量必须具有相同的大小.
- 两个向量的范数均应为非负数.
private[mllib] def fastSquaredDistance(
v1: Vector,
norm1: Double,
v2: Vector,
norm2: Double,
precision: Double = 1e-6): Double = {
val n = v1.size
require(v2.size == n)
require(norm1 >= 0.0 && norm2 >= 0.0)
...
I am running into an exception while inside KMeans.train()
like below:
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:212)
at org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:487)
at org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:589)
at org.apache.spark.mllib.clustering.KMeans$$anonfun$runAlgorithm$3.apply(KMeans.scala:304)
at org.apache.spark.mllib.clustering.KMeans$$anonfun$runAlgorithm$3.apply(KMeans.scala:301)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:301)
at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:227)
at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:209)
at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:530)
This doesn't give me any clue on where to start debugging.
I found an old post but that issue was in KMeans.predict()
whereas this is happening in the training phase itself.
Just take a look at the source code and it will become clear:
- Your vectors have to have the same size.
- The norms of both vectors should be non-negative.
private[mllib] def fastSquaredDistance(
v1: Vector,
norm1: Double,
v2: Vector,
norm2: Double,
precision: Double = 1e-6): Double = {
val n = v1.size
require(v2.size == n)
require(norm1 >= 0.0 && norm2 >= 0.0)
...
这篇关于Spark-KMeans.train中的IllegalArgumentException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!