将 JSON 数据加载到 AWS Redshift 导致 NULL 值 [英] Loading JSON data to AWS Redshift results in NULL values

查看:27
本文介绍了将 JSON 数据加载到 AWS Redshift 导致 NULL 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行加载/复制操作以将数据从 S3 存储桶中的 JSON 文件直接导入到 Redshift.COPY操作成功,COPY后,表的行数/记录数正确,但每条记录都是NULL!

I am trying to perform a load/copy operation to import data from JSON files in an S3 bucket directly to Redshift. The COPY operation succeeds, and after the COPY, the table has the correct number of rows/records, but every record is NULL !

加载需要预期的时间,COPY 命令返回 OK,Redshift 控制台报告成功且没有错误...但如果我从表中执行简单查询,它只返回 NULL 值.

It takes the expected amount of time for the load, the COPY command returns OK, the Redshift console reports successful and no errors... but if I perform a simple query from the table, it returns only NULL values.

JSON 非常简单 + 扁平,并且格式正确(根据我在此处找到的示例:http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html)

The JSON is very simple + flat, and formatted correctly (according to examples I found here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html)

基本上,每行一行,格式如下:

Basically, it is one row per line, formatted like:

{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }

我尝试过根据在 JSON 对象中找到的值和数据类型重写架构以及从未压缩的文件中复制等方法.我认为 JSON 可能在加载时没有被正确解析,但如果无法解析对象,它可能会引发错误.

I have tried things like rewriting the schema based on values and data types found in the JSON objects and also copying from uncompressed files. I thought perhaps the JSON was not being parsed correctly upon load, but it should presumably raise an error if the objects cannot be parsed.

我的 COPY 命令如下所示:

My COPY command looks like this:

copy events from 's3://mybucket/json/prefix' 
with credentials 'aws_access_key_id=xxx;aws_secret_access_key=xxx'
json 'auto' gzip;

任何指导将不胜感激!谢谢.

Any guidance would be appreciated! Thanks.

推荐答案

所以我已经找到了原因 - 这在我在原始帖子中提供的描述中是不明显的.

So I have discovered the cause - This would not have been evident from the description I provided in my original post.

在 Redshift 中创建表时,列名将转换为小写.执行 COPY 操作时,列名区分大小写.

When you create a table in Redshift, the column names are converted to lowercase. When you perform a COPY operation, the column names are case sensitive.

我一直在尝试加载的输入数据使用驼峰命名法作为列名,因此当我执行 COPY 时,列与定义的模式不匹配(现在使用所有小写的列名)

The input data that I have been trying to load is using camelCase for column names, and so when I perform the COPY, the columns do not match up with the defined schema (which now uses all lowercase column names)

不过,该操作不会引发错误.它只是在所有不匹配的列中留下 NULL(在这种情况下,所有列)

The operation does not raise an error, though. It just leaves NULLs in all the columns that did not match (in this case, all of them)

希望这有助于有人避免同样的困惑!

Hope this helps somebody to avoid the same confusion!

这篇关于将 JSON 数据加载到 AWS Redshift 导致 NULL 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆