如何关联包含数组的JSON [英] How to relationalize JSON containing arrays

查看:62
本文介绍了如何关联包含数组的JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用AWS Glue读取包含JSON的数据文件(在S3上).这是一个JSON,数据包含在数组中.我试过使用relationalize()函数,但不适用于数组.它确实适用于嵌套的JSON,但这不是输入的数据格式.

I am using AWS Glue to read data file containing JSON (on S3). This one is a JSON with data contained in array. I have tried using relationalize() function but it doesn't work on array. It does work on nested JSON but this is not the data format of input.

有没有一种方法可以将JSON与其中的数组建立关系?

Is there a way to relationalize JSON with arrays in it?

输入数据:

{
    "ID":"1234",
    "territory":"US",
    "imgList":[
        {
            "type":"box"
            "locale":"en-US"
            "url":"boxart/url.jpg"
        },
        {
            "type":"square"
            "locale":"en-US"
            "url":"square/url.jpg"
        }
    ]
}

代码:

dfc = Relationalize.apply(frame = datasource0, staging_path = glue_temp_storage, name = "root", transformation_ctx = "dfc")
dfc.select('root').toDF().show()

输出:

+----+----------+--------+
|ID  |territory |imgList |
+----+----------+--------+
|1234|       US |       1|
+----+----------+--------+

所需的输出:

+----+----------+-------------+---------------+---------------+
|ID  |territory |imgList.type |imgList.locale |imgList.url    |
+----+----------+-------------+---------------+---------------+
|1234|       US |       box   |         en-US |boxart/url.jpg |
+----+----------+-------------+---------------+---------------+
|1234|       US |       square|         en-US |square/url.jpg |
+----+----------+-------------+---------------+---------------+

推荐答案

Relationalize为JSON文档中的每个数组创建DynamicFrame.因此,您只需要获取它并加入根表:

Relationalize creates DynamicFrames for each arrays in the JSON document. So you just need to get it and join with the root table:

dfc = Relationalize.apply(frame = datasource0, staging_path = glue_temp_storage, name = "root", transformation_ctx = "dfc")
root_df = dfc.select('root')
imgList_df = dfc.select('root_imgList')

df = Join.apply(root_df, imgList_df, 'imgList', 'id')
df.toDF().show()

这篇关于如何关联包含数组的JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆