从数据库读取,同时写入另一个数据库 [英] Read from database, write to another database, simultaneously

查看:364
本文介绍了从数据库读取,同时写入另一个数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个简单的脚本来读取一个数据库(oracle)和

写入另一个(postgresql)。我从
块中的oracle中检索数据并将数据连续丢弃到postgresql。一个

的python数据库客户端的作者提到使用一个线程来从oracle数据库中检索数据,另一个用来插入

数据进入postgresql,类似于两个线程之间的管道

可能有意义,保持两个IO流忙。关于如何开始使用
的任何提示?


谢谢,

肖恩

解决方案

Sean Davis写道:


其中一个python数据库客户的作者提到了

使用一个线程从oracle数据库中检索数据和

另一个将数据插入到postgresql中,类似于两个线程之间的
管道可能有意义,保持IO

流忙。



恕我直言他错了。与CPU性能相比,网络交互速度相当慢,因此没有任何收益(甚至可能因为线程管理和锁定而导致开销)。即使在多处理器机器上也是如此,这不仅仅是因为几乎没有任何东西可以计算,而且只有IO流量只需要b $ b。 CMIIW。


使用多路复用,你可以通过简单的代码获得良好的结果,而不会出现死锁的危险。看看asyncore(标准库)

或Twisted框架 - 个人而言,我更喜欢后者。


问候,

Bj?


-

BOFH借口#194:


我们只支持1200 bps连接。


Bjoern Schliessmann写道:


Sean Davis写道:
< blockquote class =post_quotes>
>其中一个python数据库客户端的作者提到
使用一个线程从oracle数据库中检索数据,而另一个线程将数据插入到oracle数据库中在两个线程之间使用类似管道的postgresql可能有意义,保持两个IO流都很忙。



恕我直言他错了。与CPU性能相比,网络交互速度相当慢,因此没有任何收益(甚至可能因为线程管理和锁定而导致开销)。即使在多处理器机器上也是如此,这不仅仅是因为几乎没有任何东西可以计算,而且只有IO流量只需要b $ b。 CMIIW。


使用多路复用,你可以通过简单的代码获得良好的结果,而不会出现死锁的危险。看看asyncore(标准库)

或Twisted框架 - 个人而言,我更喜欢后者。


问候,


Bj ?? rn



Sean你不能赢 - 每个人都有不同的想法!你需要解释一下,oracle有数百万条记录,并且有可能将一条管道打开,然后用
为Postgres提供支持。


有一件事我没有得到 - 这是一次性转移或者经常会发生的事情。


一次转移到时间问题。


Johnf


Bjoern Schliessmannaécrit:


Sean Davis写道:


>其中一个python数据库客户端的作者提到
使用一个线程来检索数据从oracle数据库和另一个将数据插入到postgresql中,类似于两个线程之间的管道可能有意义,保持两个IO流都很忙。



恕我直言他错了。与CPU性能相比,网络交互速度相当慢,因此没有任何收益(甚至可能因为线程管理和锁定而导致开销)。即使在多处理器机器上也是如此,这不仅仅是因为几乎没有任何东西可以计算,而且只有IO流量只需要b $ b。 CMIIW。



不太确定,Python脚本中的CPU很低,但数据库端可能有
CPU +磁盘活动[使用缓存管理和其他

优化磁盘访问]。

因此,通过读者线程和编写器线程,他可以选择一个

数据库与另一个数据库上的插入并行执行。

之后,他必须知道这两个数据库是否使用相同的磁盘,相同的

控制器,同一主机。 ..或者不是。


但是,如果它只是一次性的工作,也许优化是必要的。

必要。


I am working on a simple script to read from one database (oracle) and
write to another (postgresql). I retrieve the data from oracle in
chunks and drop the data to postgresql continuously. The author of one
of the python database clients mentioned that using one thread to
retrieve the data from the oracle database and another to insert the
data into postgresql with something like a pipe between the two threads
might make sense, keeping both IO streams busy. Any hints on how to
get started?

Thanks,
Sean

解决方案

Sean Davis wrote:

The author of one of the python database clients mentioned that
using one thread to retrieve the data from the oracle database and
another to insert the data into postgresql with something like a
pipe between the two threads might make sense, keeping both IO
streams busy.

IMHO he''s wrong. Network interaction is quite slow compared with CPU
performance, so there''s no gain (maybe even overhead due to thread
management and locking stuff). That''s true even on multiprocessor
machines, not only because there''s almost nothing to compute but
only IO traffic. CMIIW.

Using multiplexing, you''ll get good results with simple code without
the danger of deadlocks. Have a look at asyncore (standard library)
or the Twisted framework -- personally, I prefer the latter.

Regards,
Bj?rn

--
BOFH excuse #194:

We only support a 1200 bps connection.


Bjoern Schliessmann wrote:

Sean Davis wrote:

>The author of one of the python database clients mentioned that
using one thread to retrieve the data from the oracle database and
another to insert the data into postgresql with something like a
pipe between the two threads might make sense, keeping both IO
streams busy.


IMHO he''s wrong. Network interaction is quite slow compared with CPU
performance, so there''s no gain (maybe even overhead due to thread
management and locking stuff). That''s true even on multiprocessor
machines, not only because there''s almost nothing to compute but
only IO traffic. CMIIW.

Using multiplexing, you''ll get good results with simple code without
the danger of deadlocks. Have a look at asyncore (standard library)
or the Twisted framework -- personally, I prefer the latter.

Regards,
Bj??rn

Sean you can''t win - everyone has a different idea! You need to explain
that oracle has millions of records and it''s possible to a pipe open to
feed the Postgres side.

One thing I didn''t get - is this a one time transfer or something that is
going to happen often.

One time transfer live to the time issue.

Johnf


Bjoern Schliessmann a écrit :

Sean Davis wrote:

>The author of one of the python database clients mentioned that
using one thread to retrieve the data from the oracle database and
another to insert the data into postgresql with something like a
pipe between the two threads might make sense, keeping both IO
streams busy.


IMHO he''s wrong. Network interaction is quite slow compared with CPU
performance, so there''s no gain (maybe even overhead due to thread
management and locking stuff). That''s true even on multiprocessor
machines, not only because there''s almost nothing to compute but
only IO traffic. CMIIW.


Not so sure, there is low CPU in the Python script, but there may be
CPU+disk activity on the database sides [with cache management and other
optimizations on disk access].
So, with a reader thread and a writer thread, he can have a select on a
database performed in parallel with an insert on the other database.
After, he must know if the two databases use same disks, same
controller, same host... or not.

But, if its only a do-once job, maybe the optimization is net really
necessary.


这篇关于从数据库读取,同时写入另一个数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆