PostgreSQL无缝序列 [英] PostgreSQL gapless sequences

查看:95
本文介绍了PostgreSQL无缝序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从MySql迁移到Postgres,我注意到当您从MySql删除行时,在创建新行时会重复使用这些行的唯一ID.使用Postgres,如果您创建并删除行,则不会再次使用唯一ID.

在Postgres中是否有这种行为的原因?在这种情况下,可以使其表现得更像MySql吗?

解决方案

序列之间存在间隙,允许并发插入.尝试避免间隔或重新使用已删除的ID会造成可怕的性能问题.参见 PostgreSQL.

PostgreSQL SEQUENCE s 用于分配ID .这些仅增加一次,并且不受通常的事务回滚规则的约束,以允许多个事务同时获取新的ID.这意味着,如果事务回滚,则这些ID将被丢弃";否则,这些ID将被丢弃".没有保留的免费" ID列表,只有当前的ID计数器.如果数据库异常关闭,通常也会增加序列.

合成键(ID)仍然是毫无意义的.它们的顺序并不重要,它们唯一的重要属性是唯一性.您无法有意义地测量两个ID的相距",也无法有意义地说出一个ID大于或小于另一个ID.您所能做的就是说等于"或不等于".还有其他不安全的地方.您不应该在意差距.

如果您需要一个无间断的序列来重复使用已删除的ID,则可以拥有一个序列,而您只需要为此付出大量的性能-特别是,您完全不能在INSERT上进行任何并发操作,因为您必须扫描表以查找最低的可用ID,因此要锁定该表以进行写入,因此其他任何事务都不能声明相同的ID.尝试搜索"PostgreSQL无间隙序列".

最简单的方法是使用计数器表和获取下一个ID的函数.这是一个通用的版本,它使用计数器表来生成连续的无间隙ID.但是,它不会重复使用ID.

CREATE TABLE thetable_id_counter ( last_id integer not null );
INSERT INTO thetable_id_counter VALUES (0);

CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $$
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    RETURN next_value;
END;
$$ LANGUAGE plpgsql;

COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';

用法:

INSERT INTO dummy(id, blah) 
VALUES ( get_next_id('thetable_id_counter','last_id'), 42 );

请注意,当一个未清交易获得ID时,所有尝试调用get_next_id的其他交易都将阻塞,直到第一个交易提交或回滚为止.对于不可避免的ID,这是不可避免的,并且是设计使然.

如果要在一个表中存储多个用于不同目的的计数器,只需在上述函数中添加一个参数,在计数器表中添加一列,并在与该参数匹配的UPDATE中添加一个WHERE子句到添加的列.这样,您可以拥有多个独立锁定的计数器行. 只是为新计数器添加额外的列.

此功能不会重复使用已删除的ID,只是避免引入空白.

我建议重用ID ...不要重用ID.

如果确实需要,可以通过在感兴趣的表上添加ON INSERT OR UPDATE OR DELETE触发器来实现,该触发器将已删除的ID添加到自由列表边表,并在INSERT ed.将UPDATE视为DELETE,后跟INSERT.现在,修改上面的ID生成函数,使其执行SELECT free_id INTO next_value FROM free_ids FOR UPDATE LIMIT 1,如果找到,则执行DELETE该行. IF NOT FOUND照常从生成器表中获取一个新的ID.这是对先前功能的未经测试的扩展,以支持重用:

CREATE OR REPLACE FUNCTION get_next_id_reuse(countertable regclass, countercolumn text, freelisttable regclass, freelistcolumn text) RETURNS integer AS $$
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('SELECT %I FROM %s FOR UPDATE LIMIT 1', freelistcolumn, freelisttable) INTO next_value;
    IF next_value IS NOT NULL THEN
        EXECUTE format('DELETE FROM %s WHERE %I = %L', freelisttable, freelistcolumn, next_value);
    ELSE
        EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    END IF;
    RETURN next_value;
END;
$$ LANGUAGE plpgsql;

I'm moving from MySql to Postgres, and I noticed that when you delete rows from MySql, the unique ids for those rows are re-used when you make new ones. With Postgres, if you create rows, and delete them, the unique ids are not used again.

Is there a reason for this behaviour in Postgres? Can I make it act more like MySql in this case?

解决方案

Sequences have gaps to permit concurrent inserts. Attempting to avoid gaps or to re-use deleted IDs creates horrible performance problems. See the PostgreSQL wiki FAQ.

PostgreSQL SEQUENCEs are used to allocate IDs. These only ever increase, and they're exempt from the usual transaction rollback rules to permit multiple transactions to grab new IDs at the same time. This means that if a transaction rolls back, those IDs are "thrown away"; there's no list of "free" IDs kept, just the current ID counter. Sequences are also usually incremented if the database shuts down uncleanly.

Synthetic keys (IDs) are meaningless anyway. Their order is not significant, their only property of significance is uniqueness. You can't meaningfully measure how "far apart" two IDs are, nor can you meaningfully say if one is greater or less than another. All you can do is say "equal" or "not equal". Anything else is unsafe. You shouldn't care about gaps.

If you need a gapless sequence that re-uses deleted IDs, you can have one, you just have to give up a huge amount of performance for it - in particular, you cannot have any concurrency on INSERTs at all, because you have to scan the table for the lowest free ID, locking the table for write so no other transaction can claim the same ID. Try searching for "postgresql gapless sequence".

The simplest approach is to use a counter table and a function that gets the next ID. Here's a generalized version that uses a counter table to generate consecutive gapless IDs; it doesn't re-use IDs, though.

CREATE TABLE thetable_id_counter ( last_id integer not null );
INSERT INTO thetable_id_counter VALUES (0);

CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $$
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    RETURN next_value;
END;
$$ LANGUAGE plpgsql;

COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';

Usage:

INSERT INTO dummy(id, blah) 
VALUES ( get_next_id('thetable_id_counter','last_id'), 42 );

Note that when one open transaction has obtained an ID, all other transactions that try to call get_next_id will block until the 1st transaction commits or rolls back. This is unavoidable and for gapless IDs and is by design.

If you want to store multiple counters for different purposes in a table, just add a parameter to the above function, add a column to the counter table, and add a WHERE clause to the UPDATE that matches the parameter to the added column. That way you can have multiple independently-locked counter rows. Do not just add extra columns for new counters.

This function does not re-use deleted IDs, it just avoids introducing gaps.

To re-use IDs I advise ... not re-using IDs.

If you really must, you can do so by adding an ON INSERT OR UPDATE OR DELETE trigger on the table of interest that adds deleted IDs to a free-list side table, and removes them from the free-list table when they're INSERTed. Treat an UPDATE as a DELETE followed by an INSERT. Now modify the ID generation function above so that it does a SELECT free_id INTO next_value FROM free_ids FOR UPDATE LIMIT 1 and if found, DELETEs that row. IF NOT FOUND gets a new ID from the generator table as normal. Here's an untested extension of the prior function to support re-use:

CREATE OR REPLACE FUNCTION get_next_id_reuse(countertable regclass, countercolumn text, freelisttable regclass, freelistcolumn text) RETURNS integer AS $$
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('SELECT %I FROM %s FOR UPDATE LIMIT 1', freelistcolumn, freelisttable) INTO next_value;
    IF next_value IS NOT NULL THEN
        EXECUTE format('DELETE FROM %s WHERE %I = %L', freelisttable, freelistcolumn, next_value);
    ELSE
        EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    END IF;
    RETURN next_value;
END;
$$ LANGUAGE plpgsql;

这篇关于PostgreSQL无缝序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆