在星火马preduce意外结果 [英] Unexpected results in Spark MapReduce

查看:160
本文介绍了在星火马preduce意外结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新来的火花,并希望了解马preduce如何被引擎盖下完成,以确保我用它正常。 这篇文章提供了极大的答案,但我的成绩似乎并没有跟随描述的逻辑。我运行星火快速入门指南 Scala中的命令线。当我做线路长度此外正确,事情出来就好了。线路总长为1213:

I'm new to Spark and want to understand how MapReduce gets done under the hood to ensure I use it properly. This post provided a great answer, but my results don't seem to follow the logic described. I'm running the Spark Quick Start guide in Scala on command line. When I do line length addition properly, things come out just fine. Total line length is 1213:

scala> val textFile = sc.textFile("README.md")

scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))

scala> val linesWithSparkLengths = linesWithSpark.map(s => s.length)

scala> linesWithSparkLengths.foreach(println)

Result:
14
78
73
42
68
17
62
45
76
64
54
74
84
29
136
77
77
73
70

scala> val totalLWSparkLength = linesWithSparkLengths.reduce((a,b) => a+b)
    totalLWSparkLength: Int = 1213

当我调整稍微用(A-B),而不是(A ​​+ B)

When I tweak it slightly to use (a-b) instead of (a+b),

scala> val totalLWSparkTest = linesWithSparkLengths.reduce((a,b) => a-b)

我的预期-1185,根据这个帖子逻辑:

List(14,78,73,42,68,17,62,45,76,64,54,74,84,29,136,77,77,73,70).reduce( (x,y) => x - y )
  Step 1 : op( 14, 78 ) will be the first evaluation. 
     x is 14 and y is 78. Result of x - y = -64.
  Step 2:  op( op( 14, 78 ), 73 )
     x is op(14,78) = -64 and y = 73. Result of x - y = -137
  Step 3:  op( op( op( 14, 78 ), 73 ), 42) 
     x is op( op( 14, 78 ), 73 ) = -137 and y is 42. Result is -179.
  ...
  Step 18:  op( (... ), 73), 70) will be the final evaluation.
     x is -1115 and y is 70. Result of x - y is -1185.

然而,奇怪的事情发生了:

However, something strange happens:

scala> val totalLWSparkTest = linesWithSparkLengths.reduce((a,b) => a-b)
totalLWSparkTest: Int = 151

当我再次运行...

scala> val totalLWSparkTest = linesWithSparkLengths.reduce((a,b) => a-b)
totalLWSparkTest: Int = -151

谁能告诉我,为什么结果是151(或-151),而不是-1185?

Can anyone tell me why the result is 151 (or -151) instead of -1185?

推荐答案

这是因为减法是联想也不可交换都不是。让我们开始关联:

It happens because subtraction is neither associative nor commutative. Lets start with associativity:

(- (- (- 14 78) 73) 42) 
(- (- -64 73) 42)
(- -137 42) 
-179

是不一样的

(- (- 14 78) (- 73 42))
(- -64 (- 73 42))
(- -64 31)
-95

现在它的时间为可交换:

Now its time for commutativity:

(- (- (- 14 78) 73) 42) ;; From the previous example

是不一样的

(- (- (- 42 73) 78) 14)
(- (- -31 78) 14)
(- -109 14)
-123

星火首先适用减少个人分区,然后合并以任意顺序部分结果。如果使用的功能不符合一个或两个标准最后结果可以是非确定性的。

Spark first applies reduce on individual partitions and then merges partial results in arbitrary order. If function you use doesn't meet one or both criteria final results can be non-deterministic.

这篇关于在星火马preduce意外结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆