使用复制命令和清单文件将镶木地板格式文件加载到Amazon Redshift中时出错 [英] Error while loading parquet format file into Amazon Redshift using copy command and manifest file
问题描述
我正在尝试使用清单文件加载镶木地板文件并出现错误.
I'm trying to load parquet file using manifest file and getting below error.
查询:124138由于内部错误而失败.档案' https://s3.amazonaws.com/sbredshift-east/data/000002_0 版本号无效:)
query: 124138ailed due to an internal error. File 'https://s3.amazonaws.com/sbredshift-east/data/000002_0 has an invalid version number: )
这是我的复制命令
从"s3://sbredshift-east/manifest/supplier.manifest"复制测试表IAM_ROLE'arn:aws:iam :: 123456789:role/MyRedshiftRole123'格式为PARQUET清单;
这是我的清单文件
**{
"entries":[
{
"url":"s3://sbredshift-east/data/000002_0",
"mandatory":true,
"meta":{
"content_length":1000
}
}
]
}**
通过指定文件名,我可以使用复制命令加载相同的文件.
I'm able to load the same file using copy command by specifying the file name.
从's3://sbredshift-east/data/000002_0'复制测试表IAM_ROLE'arn:aws:iam :: 123456789:role/MyRedshiftRole123'FOR PARQUET;
INFO:加载到表供应商"中的操作已完成,已成功加载800000条记录.复制
INFO: Load into table 'supplier' completed, 800000 record(s) loaded successfully. COPY
我的复制声明中可能有什么问题?
What could be wrong in my copy statement?
推荐答案
获取镶木地板副本以使用清单文件的唯一方法是添加具有content_length的元密钥.
The only way I've gotten parquet copy to work with manifest file is to add the meta key with the content_length.
从我可以在错误日志中收集的信息来看,用于拼花地板(带有清单)的COPY命令可能首先是使用Redshift Spectrum作为外部表来读取文件的.如果是这种情况,则此隐藏步骤确实需要content_step,这与他们最初关于COPY命令的声明相矛盾.
From what I can gather in my error logs, the COPY command for parquet (w/ manifest) might first be reading the files using Redshift Spectrum as an external table. If that's the case, this hidden step does require the content_step which contradicts their initial statement about COPY commands.
https://docs.amazonaws.cn/zh_CN/redshift/latest/dg/loading-data-files-using-manifest.html
这篇关于使用复制命令和清单文件将镶木地板格式文件加载到Amazon Redshift中时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!