在单独的行上返回 Redshift JSON 数组的元素 [英] Return elements of Redshift JSON array on separate rows

查看：23 发布时间：2021/11/27 10:31:55 json amazon-redshift

本文介绍了在单独的行上返回 Redshift JSON 数组的元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下所示的 Redshift 表:

I have a Redshift table that looks like this:

 id | metadata
---------------------------------------------------------------------------
 1  | [{"pet":"dog"},{"country":"uk"}]
 2  | [{"pet":"cat"}]
 3  | []
 4  | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]

所有数组元素只有一个字段.
无法保证特定字段会出现在任何数组元素中.
一个字段名可以在数组中重复
数组元素可以是任意顺序

我想找回一张看起来像这样的桌子:

I am wanting to get back a table that looks like this:

 id |   field   |  value
------------------------
 1  | pet       | dog
 1  | country   | uk
 2  | pet       | cat
 4  | country   | germany
 4  | education | masters
 4  | country   | belgium

然后我可以将其与我对输入表其余部分的查询结合起来.

I can then combine this with my queries on the rest of the input table.

我尝试过使用 Redshift JSON 函数，但无法在 Redshift 中编写函数/使用循环/具有变量，我真的看不出有什么方法可以做到这一点！

I have tried playing around with the Redshift JSON functions, but without being able to write functions/use loops/have variables in Redshift, I really can't see a way to do this!

如果我能澄清其他任何事情，请告诉我.

Please let me know if I can clarify anything else.

推荐答案

感谢这个启发博文，我已经能够制定一个解决方案.这是:

Thanks to this inspired blog post, I've been able to craft a solution. This is:

创建一个查找表以有效地迭代"每个数组的元素.此表中的行数已等于或大于数组的最大元素数.假设这是 4(可以使用 SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table 计算):

CREATE VIEW seq_0_to_3 AS
    SELECT 0 AS i UNION ALL                                      
    SELECT 1 UNION ALL
    SELECT 2 UNION ALL    
    SELECT 3          
);

由此，我们可以为每个 JSON 元素创建一行:

From this, we can create one row per JSON element:

WITH exploded_array AS (                                                                          
    SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
    FROM input_table, seq_0_to_3 AS seq
    WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
  )
SELECT *
FROM exploded_array;

制作:

 id | json
------------------------------
 1  | {"pet":"dog"}
 1  | {"country":"uk"}
 2  | {"pet":"cat"}
 4  | {"country":"germany"}
 4  | {"education":"masters"}
 4  | {"country":"belgium"}

但是，我需要提取字段名称/值.由于我看不到使用 Redshift 的有限功能提取 JSON 字段名称的任何方法，因此我将使用正则表达式执行此操作:

However, I was needing to extract the field names/values. As I can't see any way to extract JSON field names using Redshift's limited functions, I'll do this using a regular expression:

WITH exploded_array AS (                                                                                       
    SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
    FROM input_table, seq_0_to_3 AS seq
    WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field)
FROM (
    SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\w+[^"]') AS field
    FROM exploded_array
);

这篇关于在单独的行上返回 Redshift JSON 数组的元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在单独的行上返回 Redshift JSON 数组的元素 [英] Return elements of Redshift JSON array on separate rows

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在单独的行上返回 Redshift JSON 数组的元素 [英] Return elements of Redshift JSON array on separate rows

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭