roximate分位数给出了Spark(Scala)中的中位数不正确? [英] approxQuantile give incorrect Median in Spark (Scala)?

查看：70 发布时间：2020/9/4 2:19:04 scala apache-spark

本文介绍了roximate分位数给出了Spark(Scala)中的中位数不正确?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下测试数据:

 val data = List(
        List(47.5335D),
        List(67.5335D),
        List(69.5335D),
        List(444.1235D),
        List(677.5335D)
      )

我预计中位数为69.5335. 但是，当我尝试使用此代码找到确切的中位数时:

I'm expecting median to be 69.5335. But when I try to find exact median with this code:

df.stat.approxQuantile(column, Array(0.5), 0)

它给我:444.1235

It gives me: 444.1235

为什么这样以及如何解决?

Why is this so and how it can be fixed?

我正在这样做:

      val data = List(
        List(47.5335D),
        List(67.5335D),
        List(69.5335D),
        List(444.1235D),
        List(677.5335D)
      )

      val rdd = sparkContext.parallelize(data).map(Row.fromSeq(_))
      val schema = StructType(Array(
        StructField("value", DataTypes.DoubleType, false)
      ))

      val df = sqlContext.createDataFrame(rdd, schema)
      df.createOrReplaceTempView(tableName)
val df2 = sc.sql(s"SELECT value FROM $tableName")
val median = df2.stat.approxQuantile("value", Array(0.5), 0)

所以我正在创建临时表.然后在其中搜索，然后计算结果.只是为了测试.

So I'm creating temp table. Then search inside it and then calculate result. It's just for testing.

roximate分位数给出了Spark(Scala)中的中位数不正确? [英] approxQuantile give incorrect Median in Spark (Scala)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

roximate分位数给出了Spark(Scala)中的中位数不正确? [英] approxQuantile give incorrect Median in Spark (Scala)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭