将JSON数据加载到AWS Redshift会导致NULL值 [英] Loading JSON data to AWS Redshift results in NULL values

查看:113
本文介绍了将JSON数据加载到AWS Redshift会导致NULL值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行加载/复制操作,以将数据从S3存储桶中的JSON文件直接导入Redshift. COPY操作成功,并且在COPY之后,该表具有正确的行数/记录数,但是每条记录均为NULL!

I am trying to perform a load/copy operation to import data from JSON files in an S3 bucket directly to Redshift. The COPY operation succeeds, and after the COPY, the table has the correct number of rows/records, but every record is NULL !

这花费了预期的加载时间,COPY命令返回OK,Redshift控制台报告成功并且没有错误...但是,如果我从表中执行简单查询,它将仅返回NULL值.

It takes the expected amount of time for the load, the COPY command returns OK, the Redshift console reports successful and no errors... but if I perform a simple query from the table, it returns only NULL values.

JSON非常简单+扁平,并且格式正确(根据我在此处找到的示例: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html )

The JSON is very simple + flat, and formatted correctly (according to examples I found here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html)

基本上,每行一行,格式如下:

Basically, it is one row per line, formatted like:

{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }

我已经尝试过诸如基于JSON对象中找到的值和数据类型重写架构以及从未压缩文件中进行复制之类的事情.我认为加载时可能无法正确解析JSON,但是如果无法解析对象,可能会引发错误.

I have tried things like rewriting the schema based on values and data types found in the JSON objects and also copying from uncompressed files. I thought perhaps the JSON was not being parsed correctly upon load, but it should presumably raise an error if the objects cannot be parsed.

我的COPY命令如下:

My COPY command looks like this:

copy events from 's3://mybucket/json/prefix' 
with credentials 'aws_access_key_id=xxx;aws_secret_access_key=xxx'
json 'auto' gzip;

任何指导将不胜感激!谢谢.

Any guidance would be appreciated! Thanks.

推荐答案

所以我已经找到了原因-从我在原始帖子中提供的描述中,这不是很明显.

So I have discovered the cause - This would not have been evident from the description I provided in my original post.

在Redshift中创建表时,列名将转换为小写. 当您执行COPY操作时,列名区分大小写.

When you create a table in Redshift, the column names are converted to lowercase. When you perform a COPY operation, the column names are case sensitive.

我一直试图加载的输入数据使用camelCase作为列名,因此,当我执行COPY时,这些列与定义的架构(现在使用所有小写的列名)不匹配

The input data that I have been trying to load is using camelCase for column names, and so when I perform the COPY, the columns do not match up with the defined schema (which now uses all lowercase column names)

不过,该操作不会引发错误.只会在所有不匹配的列中保留NULL(在这种情况下,它们都是全部)

The operation does not raise an error, though. It just leaves NULLs in all the columns that did not match (in this case, all of them)

希望这可以帮助人们避免同样的困惑!

Hope this helps somebody to avoid the same confusion!

这篇关于将JSON数据加载到AWS Redshift会导致NULL值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆