如何将数据聚合到范围内(存储桶)? [英] How to aggregate data into ranges (bucketize)?

查看：70 发布时间：2020/9/4 4:41:18 sql apache-spark apache-spark-sql

本文介绍了如何将数据聚合到范围内(存储桶)?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一张桌子

+---------------+------+
|id             | value|
+---------------+------+
|               1|118.0|
|               2|109.0|
|               3|113.0|
|               4| 82.0|
|               5| 60.0|
|               6|111.0|
|               7|107.0|
|               8| 84.0|
|               9| 91.0|
|              10|118.0|
+---------------+------+

人们想将值聚合或合并到一个范围内0,10,20,30,40,...80,90,100,110,120我该如何在SQL或更具体的spark sql中执行此操作?

ans would like aggregate or bin the values to a range 0,10,20,30,40,...80,90,100,110,120how can I perform this in SQL or more specific spark sql?

目前，我对该范围有一个侧面视图，但这似乎很笨拙/效率低下.

Currently I have a lateral view join with the range but this seems rather clumsy / inefficient.

离散化的分位数并不是我真正想要的，而是具有此范围的CUT.

The quantile discretized is not really what I want, rather a CUT with this range.

https://github.com/collectivemedia/spark-ext/blob/master/sparkext-mllib/src/main/scala/org/apache/spark/ml/feature/Binning.scala 将执行动态垃圾箱，但我宁愿需要此指定范围.

https://github.com/collectivemedia/spark-ext/blob/master/sparkext-mllib/src/main/scala/org/apache/spark/ml/feature/Binning.scala would perform dynamic bins, but I would rather need this specified range.

如何将数据聚合到范围内(存储桶)? [英] How to aggregate data into ranges (bucketize)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将数据聚合到范围内(存储桶)? [英] How to aggregate data into ranges (bucketize)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭