有多种布局从S3到红移复制JSON对象 [英] Copying json objects with multiple layouts from S3 into Redshift

查看:118
本文介绍了有多种布局从S3到红移复制JSON对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个S3桶中含\ N分隔的JSON对象多个文件。这些JSON对象可以有几个不同的布局。有一组标准中通用的所有布局键。大多数的差异只是有一些额外的按键,但是一些嵌套的JSON对象。一个文件可以有任何这些布局/所有。

I have an S3 bucket with many files containing "\n" delimited json objects. These json objects can have a few different layouts. There is a standard set of keys that are common across all the layouts. Most differences just have a few extra keys, but some have nested json objects. One file can have any/all of these layouts.

我已成功地限定在红移一个单一的,基本表和数据复制到该表中,但不是在我的表中的任何钥匙丢失。

I have managed to define a single, basic table in Redshift and copy the data into that table, but any keys not in my table are lost.

我想创建表1为每个布局我并复制到相应的表中的JSON对象。嵌套JSON对象的布局很可能停留在单一字符串列JSON,因为红移是能够解析JSON的查询。

I would like to create 1 table for each layout I have and have the json object copied into the appropriate table. The layouts with nested json objects could probably stay in a single string column as json since Redshift is able to parse json in a query.

我是新来的AWS,所以任何帮助,将AP preciated。此外,随意发表意见,说明工作以及非红移服务。

I am new to AWS, so any help would be appreciated. Also, feel free to suggest non-Redshift services that might work as well.

谢谢!

推荐答案

您需要运行一个独立的副本,你要加载的每个表。然而,你可能有嵌套对象的麻烦(截至目前)。

You'll need to run a separate COPY for each table that you want to load. However you may have trouble with nested objects (as of right now).

我们放弃了直接的JSON负载,因为它不能加载嵌套对象任意数量。每一个嵌套的对象必须提到了它的索引顺序(如'鸟巢[0]')来加载它。这是不理想的时候可能会有成千上万的对象。

We gave up on direct JSON loads because it cannot load an arbitrary number of nested objects. Each nested object has to be referred to by it's index (e.g. 'nest[0]' ) in order to load it. Which is not ideal when there could be many thousands of objects.

这篇关于有多种布局从S3到红移复制JSON对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆