在Redshift中合并文件名COPY [英] Incorporate File Name in Redshift COPY
问题描述
我需要将约200万个CSV文件从S3存储桶加载到Redshift表中。这很容易(只需将 COPY
与通配符或清单文件一起使用),只是我需要将每个文件的名称合并到结果表中。假设 file1.csv
和 file2.csv
都包含:
I need to load ~2 million CSV files from an S3 bucket to a Redshift table. This would be easy (just use a COPY
with a wildcard or a manifest file), except that I need to incorporate the name of each file into the resulting table. Suppose file1.csv
and file2.csv
both contain:
a,b,c
d,e,f
我希望我的桌子上有
file1 a b c
file1 d e f
file2 a b c
file2 d e f
有没有一种方法可以通过单个 COPY来完成
语句?还是我需要遍历文件列表并一次加载/插入一个文件?
Is there a way this can be accomplished with a single COPY
statement? Or will I need to iterate through the list of files and load/insert them one at a time?
我怀疑后一种选择会严重影响性能……
I suspect the latter option would be a massive performance hit...
推荐答案
目前不可能。
这里是简短的AWS论坛上的一个主题,并收到来自AWS的响应,该响应已创建了功能请求,但无法为此提供ETA:
https://forums.aws.amazon.com/thread.jspa?messageID=590722򐎂
Here is a brief thread on the AWS forum, with a response from AWS that the have "created a feature request" but "cannot provide an ETA on this": https://forums.aws.amazon.com/thread.jspa?messageID=590722򐎂
Stack Exchange上已经有类似的问题:
使用COPY导入时Redshift添加列
A similar question already on Stack Exchange: Redshift add column when importing with COPY
这篇关于在Redshift中合并文件名COPY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!