如何将列聚合到 JSON 数组中? [英] How to aggregate columns into a JSON array?

查看:23
本文介绍了如何将列聚合到 JSON 数组中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何转换如下数据以便在 ElasticSearch 中存储数据?

How can I transform data like below in order to store data in ElasticSearch?

这是一个 bean 的数据集,我将按产品将其聚合到一个 JSON 数组中.

Here is a dataset of a bean that I would aggregate by product into a JSON array.

List<Bean> data = new ArrayList<Bean>();
data.add(new Bean("book","John",59));
data.add(new Bean("book","Björn",61));
data.add(new Bean("tv","Roger",36));
Dataset ds = spark.createDataFrame(data, Bean.class);

ds.show(false);

+------+-------+---------+
|amount|product|purchaser|
+------+-------+---------+
|59    |book   |John     |
|61    |book   |Björn    |
|36    |tv     |Roger    |
+------+-------+---------+


ds = ds.groupBy(col("product")).agg(collect_list(map(ds.col("purchaser"),ds.col("amount")).as("map")));
ds.show(false);

+-------+---------------------------------------------+
|product|collect_list(map(purchaser, amount) AS `map`)|
+-------+---------------------------------------------+
|tv     |[[Roger -> 36]]                              |
|book   |[[John -> 59], [Björn -> 61]]                |
+-------+---------------------------------------------+

这就是我想把它变成的:

This is what I want to transform it into:

+-------+------------------------------------------------------------------+
|product|json                                                              |
+-------+------------------------------------------------------------------+
|tv     |[{purchaser: "Roger", amount:36}]                                 |
|book   |[{purchaser: "John", amount:36}, {purchaser: "Björn", amount:61}] |
+-------+------------------------------------------------------------------+

推荐答案

解决方案:

ds.groupBy(col("product"))
  .agg(collect_list(to_json(struct(col("purchaser"), col("amount"))).alias("json")));

这篇关于如何将列聚合到 JSON 数组中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆