将数据从 Amazon S3 复制到 Redshift 并避免重复行 [英] Copy data from Amazon S3 to Redshift and avoid duplicate rows
问题描述
我正在将数据从 Amazon S3 复制到 Redshift.在此过程中,我需要避免再次加载相同的文件.我的 Redshift 表没有任何独特的限制.有没有办法使用复制命令来实现这一点?
I am copying data from Amazon S3 to Redshift. During this process, I need to avoid the same files being loaded again. I don't have any unique constraints on my Redshift table. Is there a way to implement this using the copy command?
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html
我尝试添加唯一约束并将列设置为主键,但没有成功.Redshift 似乎不支持唯一/主键约束.
I tried adding unique constraint and setting column as primary key with no luck. Redshift does not seem to support unique/primary key constraints.
推荐答案
我的解决方案是在桌子上复制"之前运行删除"命令.在我的用例中,每次我需要将每日快照的记录复制到redshift表中,因此我可以使用以下'delete'命令确保删除重复记录,然后运行'copy'命令.
My solution is to run a 'delete' command before 'copy' on the table. In my use case, each time I need to copy the records of a daily snapshot to redshift table, thus I can use the following 'delete' command to ensure duplicated records are deleted, then run the 'copy' command.
DELETE from t_data where snapshot_day = 'xxxx-xx-xx';
DELETE from t_data where snapshot_day = 'xxxx-xx-xx';
这篇关于将数据从 Amazon S3 复制到 Redshift 并避免重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!