Redshift UPDATE 非常慢 [英] Redshift UPDATE prohibitively slow

查看:80
本文介绍了Redshift UPDATE 非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Redshift 集群中有一个大约 10 亿行的表.我有一份工作试图根据一些过滤器更新一些列值.更新此表中的任何内容都非常缓慢.这是一个例子:

I have a table in a Redshift cluster with ~1 billion rows. I have a job that tries to update some column values based on some filter. Updating anything at all in this table is incredibly slow. Here's an example:

SELECT col1, col2, col3
FROM SOMETABLE
WHERE col1 = 'a value of col1'
  AND col2 = 12;

上面的查询在不到一秒的时间内返回,因为我在 col1col2 上有排序键.只有一行满足此条件,因此结果集只有一行.但是,如果我运行:

The above query returns in less than a second, because I have sortkeys on col1 and col2. There is only one row that meets this criteria, so the result set is just one row. However, if I run:

UPDATE SOMETABLE
SET col3 = 20
WHERE col1 = 'a value of col1'
  AND col2 = 12;

这个查询花费了未知的时间(我在 20 分钟后停止了它).同样,它应该更新一行的一列值.

This query takes an unknown amount of time (I stopped it after 20 minutes). Again, it should be updating one column value of one row.

我也尝试遵循此处的文档:http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html,它谈到了创建一个临时登台表来更新主表,但得到了相同的结果.

I have also tried to follow the documentation here: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html, which talks about creating a temporary staging table to update the main table, but got the same results.

知道这里发生了什么吗?

Any idea what is going on here?

推荐答案

你没有提到你正在更新的表的百分比,但重要的是要注意 Redshift 中的 UPDATE 是两步流程:

You didn't mention what percentage of the table you're updating but it's important to note that an UPDATE in Redshift is a 2 step process:

  1. 必须首先将要更改的每一行标记为删除
  2. 然后必须为表中的每一列写入新版本的数据

如果您有大量列和/或正在更新大量行,那么这个过程对于数据库来说可能是非常耗费人力的.

If you have a large number of columns and/or are updating a large number of rows then this process can be very labor intensive for the database.

您可以尝试使用 CREATE TABLE AS 语句来创建表的新更新"版本,然后删除现有表并重命名新表.这还有一个额外的好处,那就是为您留下一个完全排序的表格.

You could experiment with using a CREATE TABLE AS statement to create a new "updated" version of the table and then dropping the existing table and renaming the new table. This has the added benefit of leaving you with a fully sorted table.

这篇关于Redshift UPDATE 非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆