有多种布局从S3到红移复制JSON对象 [英] Copying json objects with multiple layouts from S3 into Redshift
问题描述
我有一个S3桶中含\ N分隔的JSON对象多个文件。这些JSON对象可以有几个不同的布局。有一组标准中通用的所有布局键。大多数的差异只是有一些额外的按键,但是一些嵌套的JSON对象。一个文件可以有任何这些布局/所有。
I have an S3 bucket with many files containing "\n" delimited json objects. These json objects can have a few different layouts. There is a standard set of keys that are common across all the layouts. Most differences just have a few extra keys, but some have nested json objects. One file can have any/all of these layouts.
我已成功地限定在红移一个单一的,基本表和数据复制到该表中,但不是在我的表中的任何钥匙丢失。
I have managed to define a single, basic table in Redshift and copy the data into that table, but any keys not in my table are lost.
我想创建表1为每个布局我并复制到相应的表中的JSON对象。嵌套JSON对象的布局很可能停留在单一字符串列JSON,因为红移是能够解析JSON的查询。
I would like to create 1 table for each layout I have and have the json object copied into the appropriate table. The layouts with nested json objects could probably stay in a single string column as json since Redshift is able to parse json in a query.
我是新来的AWS,所以任何帮助,将AP preciated。此外,随意发表意见,说明工作以及非红移服务。
I am new to AWS, so any help would be appreciated. Also, feel free to suggest non-Redshift services that might work as well.
谢谢!
推荐答案
您需要运行一个独立的副本,你要加载的每个表。然而,你可能有嵌套对象的麻烦(截至目前)。
You'll need to run a separate COPY for each table that you want to load. However you may have trouble with nested objects (as of right now).
我们放弃了直接的JSON负载,因为它不能加载嵌套对象任意数量。每一个嵌套的对象必须提到了它的索引顺序(如'鸟巢[0]')来加载它。这是不理想的时候可能会有成千上万的对象。
We gave up on direct JSON loads because it cannot load an arbitrary number of nested objects. Each nested object has to be referred to by it's index (e.g. 'nest[0]' ) in order to load it. Which is not ideal when there could be many thousands of objects.
这篇关于有多种布局从S3到红移复制JSON对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!