Redshift UPDATE太慢了 [英] Redshift UPDATE prohibitively slow

查看:391
本文介绍了Redshift UPDATE太慢了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Redshift集群中有一个表,其中有约10亿行.我有一份工作试图基于某些过滤器更新某些列值.更新此表中的所有内容的速度都非常慢.这是一个示例:

I have a table in a Redshift cluster with ~1 billion rows. I have a job that tries to update some column values based on some filter. Updating anything at all in this table is incredibly slow. Here's an example:

SELECT col1, col2, col3
FROM SOMETABLE
WHERE col1 = 'a value of col1'
  AND col2 = 12;

上面的查询在不到一秒钟的时间内返回,因为我在col1col2上有排序键.只有一行满足此条件,因此结果集仅为一行.但是,如果我运行:

The above query returns in less than a second, because I have sortkeys on col1 and col2. There is only one row that meets this criteria, so the result set is just one row. However, if I run:

UPDATE SOMETABLE
SET col3 = 20
WHERE col1 = 'a value of col1'
  AND col2 = 12;

此查询花费的时间未知(我在20分钟后停止了查询).同样,它应该更新一行的一列值.

This query takes an unknown amount of time (I stopped it after 20 minutes). Again, it should be updating one column value of one row.

我也尝试遵循此处的文档: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html ,其中讨论了创建临时登台表以更新主表表,但结果相同.

I have also tried to follow the documentation here: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html, which talks about creating a temporary staging table to update the main table, but got the same results.

有什么想法吗?

推荐答案

您没有提到要更新的表的百分比,但是要注意Redshift中的UPDATE是一个两步过程:

You didn't mention what percentage of the table you're updating but it's important to note that an UPDATE in Redshift is a 2 step process:

  1. 将要更改的每一行必须首先标记为删除
  2. 然后必须为表中的每一列写入新版本的数据

如果您有大量的列和/或正在更新大量的行,那么此过程对于数据库来说可能是非常费力的.

If you have a large number of columns and/or are updating a large number of rows then this process can be very labor intensive for the database.

您可以尝试使用CREATE TABLE AS语句创建表的新更新"版本,然后删除现有表并重命名新表.这样做还有一个好处,就是让您拥有一个完全排序的表.

You could experiment with using a CREATE TABLE AS statement to create a new "updated" version of the table and then dropping the existing table and renaming the new table. This has the added benefit of leaving you with a fully sorted table.

这篇关于Redshift UPDATE太慢了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆