如何在Spark控制台中对性能进行基准测试? [英] How can I benchmark performance in Spark console?
问题描述
我刚刚开始使用Spark,目前与它的互动围绕着 spark-shell
.我想对各种命令花费多长时间进行基准测试,但是找不到如何获取时间或运行基准测试.理想情况下,我想做一些超级简单的事情,例如:
I have just started using Spark and my interactions with it revolve around spark-shell
at the moment. I would like to benchmark how long various commands take, but could not find how to get the time or run a benchmark. Ideally I would want to do something super-simple, such as:
val t = [current_time]
data.map(etc).distinct().reduceByKey(_ + _)
println([current time] - t)
弄清楚-
import org.joda.time._
val t_start = DateTime.now()
[[do stuff]]
val t_end = DateTime.now()
new Period(t_start, t_end).toStandardSeconds()
推荐答案
我建议您执行以下操作:
I suggest you do the following :
def time[A](f: => A) = {
val s = System.nanoTime
val ret = f
println("time: " + (System.nanoTime - s) / 1e9 + " seconds")
ret
}
您可以将一个函数作为时间函数的参数传递,它将计算该函数的结果,从而为您提供该函数要执行的时间.
You can pass a function as an argument to time function and it will compute the result of the function giving you the time taken by the function to be performed.
让我们考虑一个函数 foobar
,该函数将数据作为参数,然后执行以下操作:
Let's consider a function foobar
that take data as argument and then do the following :
val test = time(foobar(data))
test
将包含 foobar
的结果,您还将获得所需的时间.
test
will contains the result of foobar
and you'll get the time needed as well.
这篇关于如何在Spark控制台中对性能进行基准测试?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!