在单独的行上返回 Redshift JSON 数组的元素 [英] Return elements of Redshift JSON array on separate rows

查看:23
本文介绍了在单独的行上返回 Redshift JSON 数组的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的 Redshift 表:

I have a Redshift table that looks like this:

 id | metadata
---------------------------------------------------------------------------
 1  | [{"pet":"dog"},{"country":"uk"}]
 2  | [{"pet":"cat"}]
 3  | []
 4  | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]

  • 所有数组元素只有一个字段.
  • 无法保证特定字段会出现在任何数组元素中.
  • 一个字段名可以在数组中重复
  • 数组元素可以是任意顺序
  • 我想找回一张看起来像这样的桌子:

    I am wanting to get back a table that looks like this:

     id |   field   |  value
    ------------------------
     1  | pet       | dog
     1  | country   | uk
     2  | pet       | cat
     4  | country   | germany
     4  | education | masters
     4  | country   | belgium
    

    然后我可以将其与我对输入表其余部分的查询结合起来.

    I can then combine this with my queries on the rest of the input table.

    我尝试过使用 Redshift JSON 函数,但无法在 Redshift 中编写函数/使用循环/具有变量,我真的看不出有什么方法可以做到这一点!

    I have tried playing around with the Redshift JSON functions, but without being able to write functions/use loops/have variables in Redshift, I really can't see a way to do this!

    如果我能澄清其他任何事情,请告诉我.

    Please let me know if I can clarify anything else.

    推荐答案

    感谢这个启发 博文,我已经能够制定一个解决方案.这是:

    Thanks to this inspired blog post, I've been able to craft a solution. This is:

    1. 创建一个查找表以有效地迭代"每个数组的元素.此表中的行数已等于或大于数组的最大元素数.假设这是 4(可以使用 SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table 计算):

    CREATE VIEW seq_0_to_3 AS
        SELECT 0 AS i UNION ALL                                      
        SELECT 1 UNION ALL
        SELECT 2 UNION ALL    
        SELECT 3          
    );
    

  • 由此,我们可以为每个 JSON 元素创建一行:

  • From this, we can create one row per JSON element:

    WITH exploded_array AS (                                                                          
        SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
        FROM input_table, seq_0_to_3 AS seq
        WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
      )
    SELECT *
    FROM exploded_array;
    

    制作:

     id | json
    ------------------------------
     1  | {"pet":"dog"}
     1  | {"country":"uk"}
     2  | {"pet":"cat"}
     4  | {"country":"germany"}
     4  | {"education":"masters"}
     4  | {"country":"belgium"}
    

  • 但是,我需要提取字段名称/值.由于我看不到使用 Redshift 的有限功能提取 JSON 字段名称的任何方法,因此我将使用正则表达式执行此操作:

  • However, I was needing to extract the field names/values. As I can't see any way to extract JSON field names using Redshift's limited functions, I'll do this using a regular expression:

    WITH exploded_array AS (                                                                                       
        SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
        FROM input_table, seq_0_to_3 AS seq
        WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
    )
    SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field)
    FROM (
        SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\w+[^"]') AS field
        FROM exploded_array
    );
    

  • 这篇关于在单独的行上返回 Redshift JSON 数组的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆