如何在 Presto 中交叉连接取消嵌套 JSON 数组 [英] How to cross join unnest a JSON array in Presto

查看:25
本文介绍了如何在 Presto 中交叉连接取消嵌套 JSON 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个包含一列 JSON 的表,如下所示:

{"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]}{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}

如何编写 Presto 查询以提供所有条目的平均 b 值?

到目前为止,我认为我需要使用类似 Hive 的 lateral查看explode,其等效项是交叉联接 unnest 在 Presto 中.

但是我被困在如何为 cross join unnest 编写 Presto 查询.

如何使用cross join unnest展开所有数组元素并选中它们?

解决方案

这是一个例子

with example(message) as (价值观(json '{"payload":[{"type":"b","value":"9"},{"type":"a","value":"8"}]}'),(json '{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}'))选择n.类型,平均(n.值)发件人示例交叉连接不嵌套(投掷(JSON_EXTRACT(消息,'$.payload')作为 ARRAY(ROW(类型 VARCHAR,值 INTEGER)))) 作为 x(n)WHERE n.type = 'b'GROUP BY n.type

with 定义了一个名为 example 的公用表表达式 (CTE),其列别名为 message

VALUES 返回逐字表行集

UNNEST 在单行的一列中获取一个数组,并将该数组的元素作为多行返回.

CAST 正在将 JSON 类型更改为 UNNEST 所需的 ARRAY 类型.它可能很容易成为 ARRAY 但我发现 ARRAY(ROW() 更好,因为您可以指定列名,并在 select 子句中使用点表示法.

JSON_EXTRACT 使用 jsonPath 表达式返回 payload 键的数组值

avg()group by 应该是熟悉的 SQL.

Given a table that contains a column of JSON like this:

{"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]}
{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}

How can I write a Presto query to give me the average b value across all entries?

So far I think I need to use something like Hive's lateral view explode, whose equivalent is cross join unnest in Presto.

But I'm stuck on how to write the Presto query for cross join unnest.

How can I use cross join unnest to expand all array elements and select them?

解决方案

Here's an example of that

with example(message) as (
VALUES
(json '{"payload":[{"type":"b","value":"9"},{"type":"a","value":"8"}]}'),
(json '{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}')
)


SELECT
        n.type,
        avg(n.value)
FROM example
CROSS JOIN
    UNNEST(
            CAST(
                JSON_EXTRACT(message,'$.payload')
                    as ARRAY(ROW(type VARCHAR, value INTEGER))
                    )
                ) as x(n)
WHERE n.type = 'b'
GROUP BY n.type

with defines a common table expression (CTE) named example with a column aliased as message

VALUES returns a verbatim table rowset

UNNEST is taking an array within a column of a single row and returning the elements of the array as multiple rows.

CAST is changing the JSON type into an ARRAY type that is required for UNNEST. It could easily have been an ARRAY<MAP< but I find ARRAY(ROW( nicer as you can specify column names, and use dot notation in the select clause.

JSON_EXTRACT is using a jsonPath expression to return the array value of the payload key

avg() and group by should be familiar SQL.

这篇关于如何在 Presto 中交叉连接取消嵌套 JSON 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆