在Redshift中合并文件名COPY [英] Incorporate File Name in Redshift COPY

查看:101
本文介绍了在Redshift中合并文件名COPY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将约200万个CSV文件从S3存储桶加载到Redshift表中。这很容易(只需将 COPY 与通配符或清单文件一起使用),只是我需要将每个文件的名称合并到结果表中。假设 file1.csv file2.csv 都包含:

I need to load ~2 million CSV files from an S3 bucket to a Redshift table. This would be easy (just use a COPY with a wildcard or a manifest file), except that I need to incorporate the name of each file into the resulting table. Suppose file1.csv and file2.csv both contain:

a,b,c
d,e,f

我希望我的桌子上有

file1 a b c
file1 d e f
file2 a b c
file2 d e f

有没有一种方法可以通过单个 COPY来完成语句?还是我需要遍历文件列表并一次加载/插入一个文件?

Is there a way this can be accomplished with a single COPY statement? Or will I need to iterate through the list of files and load/insert them one at a time?

我怀疑后一种选择会严重影响性能……

I suspect the latter option would be a massive performance hit...

推荐答案

目前不可能。

这里是简短的AWS论坛上的一个主题,并收到来自AWS的响应,该响应已创建了功能请求,但无法为此提供ETA:
https://forums.aws.amazon.com/thread.jspa?messageID=590722&#590722

Here is a brief thread on the AWS forum, with a response from AWS that the have "created a feature request" but "cannot provide an ETA on this": https://forums.aws.amazon.com/thread.jspa?messageID=590722&#590722

Stack Exchange上已经有类似的问题:
使用COPY导入时Redshift添加列

A similar question already on Stack Exchange: Redshift add column when importing with COPY

这篇关于在Redshift中合并文件名COPY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆