解释 Spark 中的聚合功能(使用 Python 和 Scala) [英] Explain the aggregate functionality in Spark (with Python and Scala)

查看：29 发布时间：2021/10/26 17:46:18 python scala apache-spark aggregate rdd

本文介绍了解释 Spark 中的聚合功能(使用 Python 和 Scala)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找对 Python 中通过 spark 可用的聚合功能的更好解释.

I am looking for some better explanation of the aggregate functionality that is available via spark in python.

我的例子如下(使用Spark 1.2.0版本的pyspark)

The example I have is as follows (using pyspark from Spark 1.2.0 version)

sc.parallelize([1,2,3,4]).aggregate(
  (0, 0),
  (lambda acc, value: (acc[0] + value, acc[1] + 1)),
  (lambda acc1, acc2: (acc1[0] + acc2[0], acc1[1] + acc2[1])))

输出:

(10, 4)

我得到了预期的结果 (10,4) 这是 1+2+3+4 和 4 个元素的总和.如果我将传递给聚合函数的初始值从 (0,0) 更改为 (1,0) 我得到以下结果

I get the expected result (10,4) which is sum of 1+2+3+4 and 4 elements. If I change the initial value passed to the aggregate function to (1,0) from (0,0) I get the following result

sc.parallelize([1,2,3,4]).aggregate(
    (1, 0),
    (lambda acc, value: (acc[0] + value, acc[1] + 1)),
    (lambda acc1, acc2: (acc1[0] + acc2[0], acc1[1] + acc2[1])))

输出:

(19, 4)

值增加 9.如果我将其更改为 (2,0)，则该值会变为 (28,4)，依此类推.

The value increases by 9. If I change it to (2,0), the value goes to (28,4) and so on.

有人可以向我解释这个值是如何计算的吗?我期望值增加 1 而不是 9，期望看到 (11,4) 而我看到的是 (19,4).

Can someone explain to me how this value is calculated? I expected the value to go up by 1 not by 9, expected to see (11,4) instead I am seeing (19,4).

解释 Spark 中的聚合功能(使用 Python 和 Scala) [英] Explain the aggregate functionality in Spark (with Python and Scala)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

解释 Spark 中的聚合功能(使用 Python 和 Scala) [英] Explain the aggregate functionality in Spark (with Python and Scala)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭