PostgreSQL如何在多个CPU之间拆分查询 [英] PostgreSQL how split a query between multiple CPU

查看:35
本文介绍了PostgreSQL如何在多个CPU之间拆分查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个存储过程

I have a store procedure

DO_STUFF(obj rowFromMyTable) 

这将obj并处理一些数据,并将结果保存在一个独立的表中。
因此,我处理对象的顺序并不重要。

This take obj and process some data and save the result in an independent table. So the order i process the objects isn't important.

DO_STUFF(objA); DO_STUFF(objB); < == >  DO_STUFF(objB); DO_STUFF(objA);

事情是要创建一个存储过程来处理所有对象,但这仅使用一个CPU。

The thing is want create a store procedure to process all object, but this use only a single CPU.

for each obj in (SELECT obj from tblSOURCE)
loop
    DO_STUFF(obj);
end loop;

我想将进程拆分为多个CPU,以便更快地完成工作。

我唯一想到的就是使用2个pgAdmin窗口并在每个窗口中运行两个不同的存储过程。

I want to split the process in multiple CPU so things finish faster.
The only thing i think of was using 2 pgAdmin window and run two different store procedure in each one.

--one window run using the filter
(SELECT obj from tblSOURCE where id between 1 and 100000)

--and the other use
(SELECT obj from tblSOURCE where id between 100001 and 200000)

任何有关如何在单个存储过程中执行此操作的想法?

Any ideas of how should i do this in a single store procedure?

推荐答案

我想用来对查询进行快速多线程处理的一种技术是使用 psql GNU Parallel http://www.gnu.org/software /parallel/parallel_tutorial.html )允许一次运行多个psql命令。

A technique I like to use to get quick multi-threading for queries is to use a combination of psql and GNU Parallel (http://www.gnu.org/software/parallel/parallel_tutorial.html) to allow for multiple psql commands to be run at once.

如果创建存储的包装器包含循环并向其添加参数以获取偏移量和极限的过程,然后可以创建一个快速的bash脚本(或Python,Perl等)来生成所需的一系列psql命令。

If you create a wrapper stored procedure containing the loop and add arguments to it to take an offset and a limit, you can then create a quick bash script (or Python, Perl, et al) to generate the series of psql commands that are needed.

包含命令的文件可以并行传输,可以使用所有可用的CPU,也可以使用您确定的数量(我经常喜欢使用4个CPU,以保持机密。包装盒上的I / O,但这取决于您拥有的硬件。)

The file containing the commands can be piped into parallel and either take all the CPUs available, or a number you determine (I often like to use 4 CPUs, so as to also keep a lid on I/O on the box, but it would depend on the hardware you have).

假设包装器称为 do_stuff_wrapper(_offset,_limit)。偏移量和限制将应用于选择:

Let's say the wrapper is called do_stuff_wrapper(_offset, _limit). The offset and limit would apply to the select:

select obj from tblSOURCE offset _offset limit _limit

您生成的psql命令文件(我们将其称为parallel.dat)可能看起来像这样:

Your generated psql command file (let's call it parallel.dat) might look something like this:


psql -X -h HOST -U user database -c "select do_stuff_wrapper(0, 5000);"
psql -X -h HOST -U user database -c "select do_stuff_wrapper(5001, 5000);"
psql -X -h HOST -U user database -c "select do_stuff_wrapper(10001, 5000);"


,依此类推。

然后,您可以运行以下命令:

Then you can run the commands like this:


cat parallel.dat | parallel -j 4 {}

cat parallel.dat | parallel -j 4 {}

使多个psql命令协同运行。 Parallel还将为您传递IO(如果有的话,例如NOTICE的信息等),以使其按命令顺序结束。

To get multiple psql commands running in concert. Parallel will also pipeline the IO (if any, such as NOTICE's, etc.) for you such that it ends up in command order.

编辑: Powershell可以并行运行命令吗?)。

If you're running on Windows, you could perhaps install Cygwin, and then use parallel from there. Another, pure-Windows option would be to look into Powershell to accomplish something akin to parallel (see Can Powershell Run Commands in Parallel?).

这篇关于PostgreSQL如何在多个CPU之间拆分查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆