Redshift UPDATE 非常慢 [英] Redshift UPDATE prohibitively slow
问题描述
我在 Redshift 集群中有一个大约 10 亿行的表.我有一份工作试图根据一些过滤器更新一些列值.更新此表中的任何内容都非常缓慢.这是一个例子:
I have a table in a Redshift cluster with ~1 billion rows. I have a job that tries to update some column values based on some filter. Updating anything at all in this table is incredibly slow. Here's an example:
SELECT col1, col2, col3
FROM SOMETABLE
WHERE col1 = 'a value of col1'
AND col2 = 12;
上面的查询在不到一秒的时间内返回,因为我在 col1
和 col2
上有排序键.只有一行满足此条件,因此结果集只有一行.但是,如果我运行:
The above query returns in less than a second, because I have sortkeys on col1
and col2
. There is only one row that meets this criteria, so the result set is just one row. However, if I run:
UPDATE SOMETABLE
SET col3 = 20
WHERE col1 = 'a value of col1'
AND col2 = 12;
这个查询花费了未知的时间(我在 20 分钟后停止了它).同样,它应该更新一行的一列值.
This query takes an unknown amount of time (I stopped it after 20 minutes). Again, it should be updating one column value of one row.
我也尝试遵循此处的文档:http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html,它谈到了创建一个临时登台表来更新主表,但得到了相同的结果.
I have also tried to follow the documentation here: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html, which talks about creating a temporary staging table to update the main table, but got the same results.
知道这里发生了什么吗?
Any idea what is going on here?
推荐答案
你没有提到你正在更新的表的百分比,但重要的是要注意 Redshift 中的 UPDATE
是两步流程:
You didn't mention what percentage of the table you're updating but it's important to note that an UPDATE
in Redshift is a 2 step process:
- 必须首先将要更改的每一行标记为删除
- 然后必须为表中的每一列写入新版本的数据
如果您有大量列和/或正在更新大量行,那么这个过程对于数据库来说可能是非常耗费人力的.
If you have a large number of columns and/or are updating a large number of rows then this process can be very labor intensive for the database.
您可以尝试使用 CREATE TABLE AS
语句来创建表的新更新"版本,然后删除现有表并重命名新表.这还有一个额外的好处,那就是为您留下一个完全排序的表格.
You could experiment with using a CREATE TABLE AS
statement to create a new "updated" version of the table and then dropping the existing table and renaming the new table. This has the added benefit of leaving you with a fully sorted table.
这篇关于Redshift UPDATE 非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!