如何最大限度地减少更新期间数据库争用的可能性 [英] How To Minimize Likelihood of Database Contention During Update

查看:125
本文介绍了如何最大限度地减少更新期间数据库争用的可能性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经写了一些PostgreSQL数据库客户端代码来更新一个中心数据库,其中包含来自多个客户端的IP地址和主机名。有两个表:一个用于保存IP地址和主机名之间的映射,一个用于保存尚未解析为主机名的IP地址队列。



以下是IP地址到主机名映射表:

  CREATE TABLE g_hostmap(
appliance_id INTEGER,
ip INET,
fqdn TEXT,
resolve_time TIMESTAMP,
expire_time TIMESTAMP,
UNIQUE(appliance_id,ip))

这是工作队列表:

  CREATE表g_hostmap_work(
ip INET,
input_table TEXT)

数据库客户端从单个工作队列表中提取请求。每个请求都包含一个请求主机名的专用IPv4地址。



工作流程如下:每个客户端定期查询中央数据库工作队列,列出一个需要主机名的IP地址,对地址执行反向DNS查找,然后使用(IP地址,主机名)对更新主机名表,每次一个。我希望通过尝试同时解决相同的IP地址来减少多个客户端重复工作的可能性。



我将每批更新限制为10行中的较大者, 10%的工作队列的大小。客户的时间有点独立。在更新过程中如何进一步减少DNS名称服务器和主机名称表的争用?我的客户担心会有很多重复的工作。



以下是工作队列中项目数量的初始查询:

  SELECT COUNT(*)
FROM g_hostmap_work queued
LEFT JOIN g_hostmap cached
ON queued.ip = cached.ip
AND now()< cached.expire_time

这里是返回工作队列中项目子集的查询:

  SELECT queued.ip,queued.input_table,cached.expire_time 
FROM g_hostmap_work queued
LEFT JOIN g_hostmap cached
ON queued.ip = cached.ip
AND now()< cached.expire_time
LIMIT 10

以下是单个INSERT语句更新的示例具有新IP地址/主机名映射的数据库:

  INSERT INTO g_hostmap_20131230 VALUES 
(NULL,'192.168.54.133 ','powwow.site',now(),now()+ 900 * INTERVAL'1 SECOND')


解决方案

我会做一种奇怪的声音建议。在源表中添加一个auto-inc big int,并创建一组包含模数除法的10个索引。这是一个简单的测试用例示例:

 创建表队列(id bigserial,输入文本); 
在队列(id)上创建索引q0,其中id%10 = 0;
在队列(id)上创建索引q1,其中id%10 = 1;
在队列(id)上创建索引q2,其中id%10 = 2;
在队列(id)上创建索引q3,其中id%10 = 3;
在队列(id)上创建索引q4,其中id%10 = 4;
在队列(id)中创建索引q5,其中id%10 = 5;
在队列(id)上创建索引q6,其中id%10 = 6;
在队列(id)上创建索引q7,其中id%10 = 7;
在队列(id)上创建索引q8,其中id%10 = 8;
在队列(id)上创建索引q9,其中id%10 = 9;
插入队列select generate_series(1,50000),'this';

我们在这里做的是创建一组索引,该索引是表的1/10 。接下来,我们将选择其中一个范围的大部分工作:

  begin; 
select * from queue where id%10 = 0 limit 100 for update;
id |输入
------ + -------
10 |这个
20 |这个
30 |这个
- 在这里工作 -
commit;

现在有趣的部分。如果你有10个工作人员使用这个设置,你只需要循环他们的数字,任何超过10只会等待上面的选择更新运行。但任何其他数字(1到9)仍然可以工作。

  begin; 
select * from queue where id%10 = 1 limit 100 for update;
id |输入
----- + -------
1 |这
11 |这
21 |这个
31 |这个
- 在这里工作
提交;

这样,所有的工作分为10个桶。想要更多的桶?更改%后的数字,并增加索引数量以匹配。


I have written some PostgreSQL database client code to update a central database with a table of IP addresses and host names from multiple clients. There are two tables: one to hold mappings between IP addresses and host names, and one to hold a queue of IP addresses that have not yet been resolved to host names.

Here is the IP-address-to-host-name mapping table:

CREATE TABLE g_hostmap(
    appliance_id     INTEGER,
    ip               INET,
    fqdn             TEXT,
    resolve_time     TIMESTAMP, 
    expire_time      TIMESTAMP,
    UNIQUE(appliance_id, ip))

Here is the work queue table:

CREATE TABLE g_hostmap_work(
    ip               INET,
    input_table      TEXT)

The database clients each pull requests from a single work queue table. Each request contains a private IPv4 address for which a host name is requested.

The work flow is as follows: each client periodically queries the central database work queue for a list of IP addresses for which host names are needed, performs a reverse DNS look-up on the addresses, and then updates the host name table with the (IP address, host name) pairs, one at a time. I wish to minimize the likelihood that multiple clients will duplicate effort by attempting to resolve the same IP addresses simultaneously.

I limit each batch of updates to the larger of 10 rows or 10% of the size of the work queue in rows. The timing of the clients is somewhat independent. How can I further minimize contention for the DNS name server and host name table during the update process? My customer is concerned that there will be much duplication of effort.

Here is the initial query for a count of items in the work queue:

SELECT COUNT(*)
       FROM g_hostmap_work queued
       LEFT JOIN g_hostmap cached
            ON queued.ip = cached.ip
            AND now() < cached.expire_time

Here is the query to return a subset of items in the work queue:

SELECT queued.ip, queued.input_table, cached.expire_time
       FROM g_hostmap_work queued
       LEFT JOIN g_hostmap cached
            ON queued.ip = cached.ip
            AND now() < cached.expire_time
       LIMIT 10

Here is an example of a single INSERT statement to update the database with a new IP address/host name mapping:

INSERT INTO g_hostmap_20131230 VALUES
(NULL, '192.168.54.133', 'powwow.site', now(), now() + 900 * INTERVAL '1 SECOND')

解决方案

I'm gonna make a kind of odd sounding suggestion. Add an auto-inc big int to the source table, and create a set of 10 indexes with modulo division. Here's a simple test case example:

create table queue (id bigserial, input text);
create index q0 on queue (id) where id%10=0;
create index q1 on queue (id) where id%10=1;
create index q2 on queue (id) where id%10=2;
create index q3 on queue (id) where id%10=3;
create index q4 on queue (id) where id%10=4;
create index q5 on queue (id) where id%10=5;
create index q6 on queue (id) where id%10=6;
create index q7 on queue (id) where id%10=7;
create index q8 on queue (id) where id%10=8;
create index q9 on queue (id) where id%10=9;
insert into queue select generate_series(1,50000),'this';

What we've done here is to create a set of indexes that index 1/10th of the table. Next, we'll select a chunk of one of those ranges to work on:

begin;
select * from queue where id%10=0 limit 100 for update;
id  | input 
------+-------
10 | this
20 | this
30 | this
-- do work here --
commit;

Now the interesting part. If you have > 10 workers with this setup, you just cycle them through the numbers and any more than 10 will just wait when the above select for update runs. But any other number (1 through 9) will still work.

begin;
select * from queue where id%10=1 limit 100 for update;
 id  | input 
-----+-------
   1 | this
  11 | this
  21 | this
  31 | this
-- do work here
commit;

This way all the work is divided into 10 buckets. Want more buckets? Change the number after the % and increase the number of indexes to match.

这篇关于如何最大限度地减少更新期间数据库争用的可能性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆