在单独的行上返回 Redshift JSON 数组的元素 [英] Return elements of Redshift JSON array on separate rows
问题描述
我有一个如下所示的 Redshift 表:
I have a Redshift table that looks like this:
id | metadata
---------------------------------------------------------------------------
1 | [{"pet":"dog"},{"country":"uk"}]
2 | [{"pet":"cat"}]
3 | []
4 | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]
- 所有数组元素只有一个字段.
- 无法保证特定字段会出现在任何数组元素中.
- 一个字段名可以在数组中重复
- 数组元素可以是任意顺序
我想找回一张看起来像这样的桌子:
I am wanting to get back a table that looks like this:
id | field | value
------------------------
1 | pet | dog
1 | country | uk
2 | pet | cat
4 | country | germany
4 | education | masters
4 | country | belgium
然后我可以将其与我对输入表其余部分的查询结合起来.
I can then combine this with my queries on the rest of the input table.
我尝试过使用 Redshift JSON 函数,但无法在 Redshift 中编写函数/使用循环/具有变量,我真的看不出有什么方法可以做到这一点!
I have tried playing around with the Redshift JSON functions, but without being able to write functions/use loops/have variables in Redshift, I really can't see a way to do this!
如果我能澄清其他任何事情,请告诉我.
Please let me know if I can clarify anything else.
推荐答案
感谢这个启发 博文,我已经能够制定一个解决方案.这是:
Thanks to this inspired blog post, I've been able to craft a solution. This is:
创建一个查找表以有效地迭代"每个数组的元素.此表中的行数已等于或大于数组的最大元素数.假设这是 4(可以使用
SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table
计算):
CREATE VIEW seq_0_to_3 AS
SELECT 0 AS i UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3
);
由此,我们可以为每个 JSON 元素创建一行:
From this, we can create one row per JSON element:
WITH exploded_array AS (
SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
FROM input_table, seq_0_to_3 AS seq
WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT *
FROM exploded_array;
制作:
id | json
------------------------------
1 | {"pet":"dog"}
1 | {"country":"uk"}
2 | {"pet":"cat"}
4 | {"country":"germany"}
4 | {"education":"masters"}
4 | {"country":"belgium"}
但是,我需要提取字段名称/值.由于我看不到使用 Redshift 的有限功能提取 JSON 字段名称的任何方法,因此我将使用正则表达式执行此操作:
However, I was needing to extract the field names/values. As I can't see any way to extract JSON field names using Redshift's limited functions, I'll do this using a regular expression:
WITH exploded_array AS (
SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
FROM input_table, seq_0_to_3 AS seq
WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field)
FROM (
SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\w+[^"]') AS field
FROM exploded_array
);
这篇关于在单独的行上返回 Redshift JSON 数组的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!