将UUID作为PostgreSQL中的主键会给索引性能带来不好的影响吗? [英] Will UUID as primary key in PostgreSQL give bad index performance?

查看:987
本文介绍了将UUID作为PostgreSQL中的主键会给索引性能带来不好的影响吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



它有两个表格,可以与移动设备同步,在这些表格中可以使用数据在不同的地方创建。因此,我有一个uuid字段,它是一个除自动递增主键之外还存储GUID的字符串。 uuid是在服务器和客户端之间进行通信的。



在服务器端实现同步引擎后,我意识到这会导致性能问题,当需要(在写对象时,我需要在保存前查询uuid以获取id,而在发送数据时需要相反)。

p>我现在正考虑切换到只使用UUID作为主键,这使得写入和读取变得更简单和更快。



我已经读取了UUID作为主键在使用群集主键索引时有时会导致错误的索引性能(索引碎片)。 PostgreSQL会遇到这个问题,还是可以使用UUID作为主键?



我今天已经有一个UUID列,所以存储方面会更好,因为我放弃了正常的id列。

解决方案

(我在Heroku Postgres上工作)

我们在一些系统上使用UUID作为主键,它工作得很好。



我建议您使用 uuid-ossp 扩展名,甚至让postgres为您生成UUID:

  heroku pg:psql 
psql(9.1.4,服务器9.1.6)
SSL连接(密码:DHE -RSA-AES256-SHA,bits:256)
输入help寻求帮助。

dcvgo3fvfmbl44 => CREATE EXTENSIONuuid-ossp;
CREATE EXTENSION
dcvgo3fvfmbl44 => CREATE TABLE测试(id uuid主键默认uuid_generate_v4(),名称文本);
注意:CREATE TABLE / PRIMARY KEY将为表test创建隐式索引test_pkey
CREATE TABLE
dcvgo3fvfmbl44 => \d测试
表public.test
Column |类型|修饰符
-------- + ------ + ----------------------------- --------
id | uuid |不为null默认uuid_generate_v4()名称|文字|
索引:
test_pkeyPRIMARY KEY,btree(id)

dcvgo3fvfmbl44 =>插入测试(名称)值('hgmnz');
INSERT 0 1
dcvgo3fvfmbl44 =>从测试中选择*;
id |名称
-------------------------------------- + ------ -
e535d271-91be-4291-832f-f7883a2d374f |
$(1行)

编辑性能影响

它会始终取决于您的工作量。

整型主键具有局部性的优点,类似数据位于一起。这对于例如:范围类型查询很有帮助,例如 1到10000之间的WHERE id ,尽管锁争用更糟糕。



如果您的读取工作负载是完全随机的,因为您始终进行主键查找,那么不应该有任何可衡量的性能下降:您只需支付更大的数据类型。



你写了很多这张桌子,这张桌子很大吗?尽管我没有衡量这一点,但有可能会影响维持该指数。对于大量的数据集UUID虽然很好,但使用UUID作为标识符有一些很好的属性。



最后,我可能不是最有资格讨论或建议的人这个,因为我从来没有运行一个足够大的UUID PK,它已经成为一个问题。因人而异。 (话虽如此,我很想听听那些遇到问题的人!)

I have created an app in Rails on Heroku using a PostgreSQL database.

It has a couple of tables designed to be able to sync with mobile devices where data can be created on different places. Therefor I have a uuid field that is a string storing a GUID in addition to an auto increment primary key. The uuid is the one that is communicated between the server and the clients.

I realised after implementing the sync engine on the server side that this leads to performance issues when needing to map between uuid<->id all the time (when writing objects, I need to query for the uuid to get the id before saving and the opposite when sending back data).

I'm now thinking about switching to only using UUID as primary key making the writing and reading much simpler and faster.

I've read that UUID as primary key can sometimes give bad index performance (index fragmentation) when using clustered primary key index. Does PostgreSQL suffer from this problem or is it OK to use UUID as primary key?

I already have a UUID column today so storage wise it will be better because I drop the regular id column.

解决方案

(I work on Heroku Postgres)

We use UUIDs as primary keys on a few systems and it works great.

I recommend you use the uuid-ossp extension, and even have postgres generate UUIDs for you:

heroku pg:psql
psql (9.1.4, server 9.1.6)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.

dcvgo3fvfmbl44=> CREATE EXTENSION "uuid-ossp"; 
CREATE EXTENSION  
dcvgo3fvfmbl44=> CREATE TABLE test (id uuid primary key default uuid_generate_v4(), name text);  
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE  
dcvgo3fvfmbl44=> \d test
                 Table "public.test"  
Column | Type |              Modifiers              
--------+------+-------------------------------------  
id     | uuid | not null default uuid_generate_v4()  name   | text |  
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)

dcvgo3fvfmbl44=> insert into test (name) values ('hgmnz'); 
INSERT 0 1 
dcvgo3fvfmbl44=> select * from test;
                  id                  | name  
--------------------------------------+-------   
 e535d271-91be-4291-832f-f7883a2d374f | hgmnz  
(1 row)

EDIT performance implications

It will always depend on your workload.

The integer primary key has the advantage of locality where like-data sits closer together. This can be helpful for eg: range type queries such as WHERE id between 1 and 10000 although lock contention is worse.

If your read workload is totally random in that you always make primary key lookups, there shouldn't be any measurable performance degradation: you only pay for the larger data type.

Do you write a lot to this table, and is this table very big? It's possible, although I haven't measured this, that there are implications in maintaining that index. For lots of datasets UUIDs are just fine though, and using UUIDs as identifiers has some nice properties.

Finally, I may not be the most qualified person to discuss or advice on this, as I have never run a table large enough with a UUID PK where it has become a problem. YMMV. (Having said that, I'd love to hear of people who run into problems with the approach!)

这篇关于将UUID作为PostgreSQL中的主键会给索引性能带来不好的影响吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆