什么是随机播放&在Apache Spark中随机写入 [英] What is shuffle read & shuffle write in Apache Spark
本文介绍了什么是随机播放&在Apache Spark中随机写入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在以下在端口8080上运行的Spark管理员的屏幕截图中:
In below screenshot of Spark admin running on port 8080 :
随机阅读"和对于此代码,随机写入"参数始终为空:
The "Shuffle Read" & "Shuffle Write" parameters are always empty for this code :
import org.apache.spark.SparkContext;
object first {
println("Welcome to the Scala worksheet")
val conf = new org.apache.spark.SparkConf()
.setMaster("local")
.setAppName("distances")
.setSparkHome("C:\\spark-1.1.0-bin-hadoop2.4\\spark-1.1.0-bin-hadoop2.4")
.set("spark.executor.memory", "2g")
val sc = new SparkContext(conf)
def euclDistance(userA: User, userB: User) = {
val subElements = (userA.features zip userB.features) map {
m => (m._1 - m._2) * (m._1 - m._2)
}
val summed = subElements.sum
val sqRoot = Math.sqrt(summed)
println("value is" + sqRoot)
((userA.name, userB.name), sqRoot)
}
case class User(name: String, features: Vector[Double])
def createUser(data: String) = {
val id = data.split(",")(0)
val splitLine = data.split(",")
val distanceVector = (splitLine.toList match {
case h :: t => t
}).map(m => m.toDouble).toVector
User(id, distanceVector)
}
val dataFile = sc.textFile("c:\\data\\example.txt")
val users = dataFile.map(m => createUser(m))
val cart = users.cartesian(users) //
val distances = cart.map(m => euclDistance(m._1, m._2))
//> distances : org.apache.spark.rdd.RDD[((String, String), Double)] = MappedR
//| DD[4] at map at first.scala:46
val d = distances.collect //
d.foreach(println) //> ((a,a),0.0)
//| ((a,b),0.0)
//| ((a,c),1.0)
//| ((a,),0.0)
//| ((b,a),0.0)
//| ((b,b),0.0)
//| ((b,c),1.0)
//| ((b,),0.0)
//| ((c,a),1.0)
//| ((c,b),1.0)
//| ((c,c),0.0)
//| ((c,),0.0)
//| ((,a),0.0)
//| ((,b),0.0)
//| ((,c),0.0)
//| ((,),0.0)
}
为什么随机阅读"& 随机写入"字段为空?可以对上面的代码进行调整以便填充这些字段,以便了解
Why are "Shuffle Read" & "Shuffle Write" fields empty ? Can above code be tweaked in order to populate these fields so as to understand how
推荐答案
我相信您必须在群集/分布式模式下运行应用程序才能查看任何Shuffle读取或写入值.通常,随机播放"是由Spark操作的子集(例如groupBy,join等)触发的.
I believe you have to run your application in cluster/distributed mode to see any Shuffle read or write values. Typically "shuffle" are triggered by a subset of Spark actions (e.g., groupBy, join, etc)
这篇关于什么是随机播放&在Apache Spark中随机写入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文