将数据从 Amazon S3 复制到 Redshift 并避免重复行 [英] Copy data from Amazon S3 to Redshift and avoid duplicate rows

查看:32
本文介绍了将数据从 Amazon S3 复制到 Redshift 并避免重复行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将数据从 Amazon S3 复制到 Redshift.在此过程中,我需要避免再次加载相同的文件.我的 Redshift 表没有任何独特的限制.有没有办法使用复制命令来实现这一点?

I am copying data from Amazon S3 to Redshift. During this process, I need to avoid the same files being loaded again. I don't have any unique constraints on my Redshift table. Is there a way to implement this using the copy command?

http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html

我尝试添加唯一约束并将列设置为主键,但没有成功.Redshift 似乎不支持唯一/主键约束.

I tried adding unique constraint and setting column as primary key with no luck. Redshift does not seem to support unique/primary key constraints.

推荐答案

我的解决方案是在桌子上复制"之前运行删除"命令.在我的用例中,每次我需要将每日快照的记录复制到redshift表中,因此我可以使用以下'delete'命令确保删除重复记录,然后运行'copy'命令.

My solution is to run a 'delete' command before 'copy' on the table. In my use case, each time I need to copy the records of a daily snapshot to redshift table, thus I can use the following 'delete' command to ensure duplicated records are deleted, then run the 'copy' command.

DELETE from t_data where snapshot_day = 'xxxx-xx-xx';

DELETE from t_data where snapshot_day = 'xxxx-xx-xx';

这篇关于将数据从 Amazon S3 复制到 Redshift 并避免重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆