PostgreSQL 如何在多个 CPU 之间拆分查询 [英] PostgreSQL how to split a query between multiple CPU

查看:95
本文介绍了PostgreSQL 如何在多个 CPU 之间拆分查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个存储过程

I have a store procedure

DO_STUFF(obj rowFromMyTable) 

这需要 obj 并处理一些数据并将结果保存在一个独立的表中.所以我处理对象的顺序并不重要.

This take obj and process some data and save the result in an independent table. So the order i process the objects isn't important.

DO_STUFF(objA); DO_STUFF(objB); < == >  DO_STUFF(objB); DO_STUFF(objA);

事情是要创建一个存储过程来处理所有对象,但这仅使用单个 CPU.

The thing is want create a store procedure to process all object, but this use only a single CPU.

for each obj in (SELECT obj from tblSOURCE)
loop
    DO_STUFF(obj);
end loop;

我想将进程拆分到多个 CPU 中,以便更快地完成.
我唯一想到的是使用 2 个 pgAdmin 窗口并在每个窗口中运行两个不同的存储过程.

I want to split the process in multiple CPU so things finish faster.
The only thing i think of was using 2 pgAdmin window and run two different store procedure in each one.

--one window run using the filter
(SELECT obj from tblSOURCE where id between 1 and 100000)

--and the other use
(SELECT obj from tblSOURCE where id between 100001 and 200000)

关于如何在单个存储过程中执行此操作的任何想法?

Any ideas of how should i do this in a single store procedure?

推荐答案

有两种方法可以做到这一点(适用于任何 Windows/Linux/Mac):

Two ways to do this (works in any of Windows / Linux / Mac):

  • PostgreSQL 9.6+ 现在应该能够(自动)在一定程度上并行化您的查询,然后您可能想看看是否需要亲自拆分查询.

  • PostgreSQL 9.6+ should now be able to (automatically) parallelize your queries to some extent and then you may want to see whether you need to take the pain to split the queries yourself at all.

使用 dblink 并通过多个回调连接到数据库.DBLink 最好的部分是这些可以是 fire-n-forget(即异步)调用,因此可以快速连续调用,然后最终等待它们全部完成(尽管您需要编织等待结果自己逻辑).但是,缺点(与同步调用一样)是,除非您跟踪进程失败/超时等情况.您可能会错误地认为,由于调用(成功)处理了所有数据,实际上可能有一些调用失败(异步).

Use dblink and connect to the database via multiple callbacks. The best part about DBLink is that these can be fire-n-forget (i.e. asynchronous) calls and so can be called in quick succession and then eventually wait till they all complete (although you'd need to weave the wait-for-result logic yourself). However, the drawback (as is with synchronous calls) is that unless you keep track of things like process failures / timeouts etc. you may wrongly assume that since the calls went through (successfully) all data was processed, where actually its possible that some calls failed (asynchronously).

SELECT * FROM dblink_send_query('testconn', 'SELECT do_stuff_wrapper(0, 5000)') AS t1;
SELECT dblink_is_busy('testconn');
SELECT * FROM dblink_get_result('testconn') AS t1(c1 TEXT, c2 TEXT, ....);

更新:举例说明使用 dblink 的异步函数.

Update: Exemplify using dblink's asynchronous functions.

这篇关于PostgreSQL 如何在多个 CPU 之间拆分查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆