如何将嵌套的json数据导入到多个连接的redshift子表中? [英] How can I import nested json data into multiple connected redshift subtables?

查看:54
本文介绍了如何将嵌套的json数据导入到多个连接的redshift子表中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的服务器日志数据如下所示:

I have server log data that looks something like this:

2014-04-16 00:01:31-0400,583 {"Items": [
  {"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
  {"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},  
  {"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]}

我想要一个通过两个主键ID连接的连接两个表的关系数据库-UsageLog表和UsageLogItems表.

What I'd like to a relational database that connects two tables - a UsageLog table and a UsageLogItems table, connected by a primary key id.

您可以看到,UsageLog表将具有以下字段:

You can see that the UsageLog table would have feilds like:

UsageLogId
Date
Time

,并且UsageLogItems表将具有类似字段

and the UsageLogItems table would have fields like

UsageLogId
UsageInfo
Role
RetailPrice
...

但是,我在将它们写入Redshift时遇到了麻烦,并且无法将每个记录与唯一且相关的ID作为键相关联.

However, I am having trouble writing these into Redshift and being able to associate each record with unique and related ids as keys.

我当前正在做的是我使用ruby脚本读取日志文件的每一行,解析出UsageLog信息(例如日期和时间),然后将其写入数据库(将一行写入Redshift非常慢)),然后从UsageLogItems信息中创建数据的csv,然后通过S3将其导入Redshift,查询UsageLogs表的最大ID,并使用该数字将两者关联起来(这也很慢,因为许多UsageLogs不包含任何项目,因此我经常从csv文件中加载0条记录).

What I am currently doing is I use a ruby script that reads each line of the log file, parses out the UsageLog info (such as date and time), writes it to the database (writing single lines to Redshift is VERY slow), then creates a csv of the data from the UsageLogItems information and imports that to Redshift via S3, querying the largest id of the UsageLogs table and using that number to relate the two (this is also slow, because lots of UsageLogs do not contain any items, so I frequently load in 0 records from csv files).

这目前确实有效,但是它太慢了,根本无法生效.有更好的方法来解决这个问题吗?

This currently does work, but it is far too painfully slow to be effective at all. Is there a better way to handle this?

推荐答案

Amazon Redshift支持通过COPY命令使用JSONPaths进行JSON提取.

Amazon Redshift supports JSON ingestion using JSONPaths via COPY command.

http://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-copy-from-json.html

这篇关于如何将嵌套的json数据导入到多个连接的redshift子表中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆