在Insert into()中不使用身份列(Amazon Redshift) [英] Identity Column not respected on Insert into() (Amazon Redshift)

查看:136
本文介绍了在Insert into()中不使用身份列(Amazon Redshift)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从具有身份,主键和排序键的一个表中选择进入具有自己的身份,主,排序集的另一个表时,我最初遇到了这个问题。它没有遵循定义的(1,1)身份,而是执行(1,8)(有时是3,8)。我认为可能是因为原始表已排序?为了弄清楚到底发生了什么,我做了一个简单得多的查询和数据,并在多个redshift集群中找到了可重现的示例。以这个测试示例为例:

I initially ran into this when I was selecting from one table with an identity, primary key and sort key into another table with its own set of identity, primary, sort. Instead of respecting the (1,1) identity as it was defined, it doing (1,8) (sometimes 3,8). I think it might be because the original table was sorted? In trying to figure out what was going on, I made a much simpler query and data and found a reproducible example across multiple redshift clusters. Take this test example:

drop table if exists test;
create temp table test (id int identity(1,1) not null
                    , value varchar(16)
                    , primary key (id))
                    diststyle all
                    sortkey (id);
insert into test (value) select 'a';
insert into test (value) select 'b';
insert into test (value) select 'c' union select 'd';
insert into test (value) values ('e'), ('f'), ('g');

select * from test;

我得到的输出是:

id  value
1   a
2   b
9   c
10  d
3   e
4   f
5   g

您会注意到Identity列未正确递增。我有其他集群上的朋友尝试此操作,他们为c和d列分别设置了20、27和65、60,而其他列则按顺序排列。请注意,尽管id列实际上不是按顺序排列的,但输出仍按输入的排序键/输入顺序正确地排序。

You'll notice the identity column is not incrementing correctly. I had friends on other clusters try this, they got 20, 27 and 65, 60 for the c and d columns, while the other columns are in order. Please note that the output is still "sorted" correctly, by the sortkey/order of input, despite that the id column isn't physically in order.

唯一的相似之处我可以想到的是,当我第一次发现它时得到的原始结果很奇怪,而测试查询是在对联合进行了排序并且我的表上有一个排序键之后。

The only similarity I can think of between the weird original results I got when first finding this and the test query is that unions are sorted and my table had a sortkey on it.

我们欢迎其他有关为什么发生这种情况以及如何解决它的想法。

Other thoughts as to why this is happening and how to fix it are welcome.

推荐答案

Redshift标识列不能保证按照标识跳过值的定义递增。但是,可以保证这些值永远不会冲突(即,它始终是唯一的)。

Redshift identity columns are not guaranteed to be incremental as defined by the identity skip value. But, it is guaranteed that the values will never collide (i.e. it will always be unique).

值的跳跃是由于Redshift的分布式体系结构。每个节点在数字行上保留一些值(n mod x,其中x是集群中的节点数)。因此,如果所有节点的行数不相等,则标识值将跳过。

The skip in value comes because of the distributed architecture of Redshift. Each node reserves some values on the number line (n mod x where x is the number of nodes in the cluster). So, if all the nodes are not getting equal amount of rows, you will see skips in the identity values.

这篇关于在Insert into()中不使用身份列(Amazon Redshift)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆