在单独的行上返回Redshift JSON数组的元素 [英] Return elements of Redshift JSON array on separate rows
问题描述
我有一个Redshift表,如下所示:
I have a Redshift table that looks like this:
id | metadata
---------------------------------------------------------------------------
1 | [{"pet":"dog"},{"country":"uk"}]
2 | [{"pet":"cat"}]
3 | []
4 | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]
- 所有数组元素只有一个字段.
- 不能保证特定字段将出现在数组的任何元素中.
- 字段名称可以在数组中重复
- 数组元素可以任意顺序
- All array elements have just one field.
- There is no guarantee that a particular field will feature in any of an array's elements.
- A field name can be repeated in an array
- The array elements can be in any order
我想找回一张看起来像这样的桌子:
I am wanting to get back a table that looks like this:
id | field | value
------------------------
1 | pet | dog
1 | country | uk
2 | pet | cat
4 | country | germany
4 | education | masters
4 | country | belgium
然后我可以将其与输入表其余部分上的查询结合起来.
I can then combine this with my queries on the rest of the input table.
我曾尝试过使用Redshift JSON函数,但是由于无法在Redshift中编写函数/使用循环/具有变量,我真的看不到做到这一点的方法!
I have tried playing around with the Redshift JSON functions, but without being able to write functions/use loops/have variables in Redshift, I really can't see a way to do this!
请让我知道是否可以澄清其他内容.
Please let me know if I can clarify anything else.
推荐答案
Thanks to this inspired blog post, I've been able to craft a solution. This is:
-
创建一个查找表,以有效地迭代"每个数组的元素.该表中的行数等于或大于数组的最大元素数.假设这是4(可以使用
SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table
计算):
CREATE VIEW seq_0_to_3 AS
SELECT 0 AS i UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3
);
由此,我们可以为每个JSON元素创建一行:
From this, we can create one row per JSON element:
WITH exploded_array AS (
SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
FROM input_table, seq_0_to_3 AS seq
WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT *
FROM exploded_array;
制作:
id | json
------------------------------
1 | {"pet":"dog"}
1 | {"country":"uk"}
2 | {"pet":"cat"}
4 | {"country":"germany"}
4 | {"education":"masters"}
4 | {"country":"belgium"}
但是,我需要提取字段名称/值.由于我看不到使用Redshift的有限函数提取JSON字段名称的任何方法,因此我将使用正则表达式来做到这一点:
However, I was needing to extract the field names/values. As I can't see any way to extract JSON field names using Redshift's limited functions, I'll do this using a regular expression:
WITH exploded_array AS (
SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
FROM input_table, seq_0_to_3 AS seq
WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field)
FROM (
SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\\w+[^"]') AS field
FROM exploded_array
);
这篇关于在单独的行上返回Redshift JSON数组的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!