将数据从Amazon S3复制到Redshift,并避免重复的行 [英] Copy data from Amazon S3 to Redshift and avoid duplicate rows
问题描述
我正在将数据从Amazon S3复制到Redshift.在此过程中,我需要避免再次加载相同的文件.我的Redshift表没有任何独特的约束.有没有办法使用复制命令来实现这一点?
I am copying data from Amazon S3 to Redshift. During this process, I need to avoid the same files being loaded again. I don't have any unique constraints on my Redshift table. Is there a way to implement this using the copy command?
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html
我尝试添加唯一约束并将列设置为主键,但没有运气. Redshift似乎不支持唯一/主键约束.
I tried adding unique constraint and setting column as primary key with no luck. Redshift does not seem to support unique/primary key constraints.
推荐答案
我的解决方案是在对表进行复制"之前运行删除"命令.在我的用例中,每次需要将每日快照的记录复制到redshift表时,因此可以使用以下删除"命令来确保删除重复的记录,然后运行复制"命令.
My solution is to run a 'delete' command before 'copy' on the table. In my use case, each time I need to copy the records of a daily snapshot to redshift table, thus I can use the following 'delete' command to ensure duplicated records are deleted, then run the 'copy' command.
从t_data中删除,其中snapshot_day ='xxxx-xx-xx';
DELETE from t_data where snapshot_day = 'xxxx-xx-xx';
这篇关于将数据从Amazon S3复制到Redshift,并避免重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!