如何将数据聚合到范围内(bucketize)? [英] How to aggregate data into ranges (bucketize)?

查看：39 发布时间：2021/11/14 22:10:06 sql apache-spark apache-spark-sql

本文介绍了如何将数据聚合到范围内(bucketize)?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一张像

+---------------+------+
|id             | value|
+---------------+------+
|               1|118.0|
|               2|109.0|
|               3|113.0|
|               4| 82.0|
|               5| 60.0|
|               6|111.0|
|               7|107.0|
|               8| 84.0|
|               9| 91.0|
|              10|118.0|
+---------------+------+

ans 想要将值聚合或合并到一个范围 0,10,20,30,40,...80,90,100,110,120我如何在 SQL 或更具体的 spark sql 中执行此操作?

ans would like aggregate or bin the values to a range 0,10,20,30,40,...80,90,100,110,120how can I perform this in SQL or more specific spark sql?

目前我有一个与范围连接的横向视图，但这似乎相当笨拙/效率低下.

Currently I have a lateral view join with the range but this seems rather clumsy / inefficient.

离散化的分位数并不是我真正想要的，而是具有此范围的 CUT.

The quantile discretized is not really what I want, rather a CUT with this range.

https://github.com/collectivemedia/spark-ext/blob/master/sparkext-mllib/src/main/scala/org/apache/spark/ml/feature/Binning.scala 会执行动态垃圾箱，但我宁愿需要这个指定的范围.

https://github.com/collectivemedia/spark-ext/blob/master/sparkext-mllib/src/main/scala/org/apache/spark/ml/feature/Binning.scala would perform dynamic bins, but I would rather need this specified range.

如何将数据聚合到范围内(bucketize)? [英] How to aggregate data into ranges (bucketize)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将数据聚合到范围内(bucketize)? [英] How to aggregate data into ranges (bucketize)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭