如何使用talend和sql server更快地加载数据 [英] how to load data faster with talend and sql server

查看:262
本文介绍了如何使用talend和sql server更快地加载数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Talend将数据加载到sql-server数据库中。



看来,我工作的最薄弱点不是数据处理,而是有效加载。我的数据库,它的速度不超过每秒17行。



有趣的是,我可以同时启动5个作业,它们都将在17rows / sec。



有什么可以解释这种缓慢性,我该如何提高速度呢?



谢谢 p>

新信息:



我之间的传输速度台式机和服务器大约1MByte



我的工作每10000次提交



我使用sql server 2008 R2



我用于工作的架构如下:



解决方案

数据库 INSERT OR UPDATE 方法的成本非常高,因为数据库无法全部批处理提交一次执行所有操作,并且必须逐行执行(ACID事务会强制执行此操作,因为如果尝试执行插入操作然后失败,则该提交中的所有其他记录也会失败)。



相反,对于大容量操作,总是最好在将提交传递到数据库然后向数据库发送2个事务之前预先确定是插入记录还是更新记录。



需要此功能的典型作业将组装要插入或更新的数据,然后查询数据库表以获取现有的主键。如果主键已经存在,则可以将其作为 UPDATE 发送,否则为 INSERT 。可以在 tMap 组件中轻松完成此操作。





在此作业中,我们希望一些数据< c $ c> INSERT OR UPDATE 到包含一些预先存在的数据的数据库表中:





我们希望向其中添加以下数据:





该工作通过将新数据放入 tHashOutput 组件中而起作用,因此它可以在同一工作中多次使用(它只是将其存储到内存中,或者



接着,从 tHashInput 组件并直接插入 tMap 。另一个 tHashInput 组件用于对表运行参数化查询:






您可能会发现此有关Talend和参数化查询的指南很有用。从这里返回的记录(因此只有数据库内部的记录)被用作对 tMap 的查找。



然后将其配置为 INNER JOIN 以查找需要更新且记录中有拒绝项的记录。 要加入的内联





这些输出随后将流到单独的 tMySQLOutput 组件中 UPDATE INSERT 必要。最后,当主子作业完成后,我们提交进行更改。


I use Talend to load data into a sql-server database.

It appears that the weakest point of my job is not the dataprocessing, but the effective load in my database, which is not faster than 17 rows/sec.

The funny point is that I can launch 5 jobs in the same time, and they'll all load at 17rows/sec .

What could explain this slowness and how could I improve the speed?

Thanks

New informations:

The transfer speed between my desktop and the server is about 1MByte

My job commits every 10 000

I use sql server 2008 R2

And the schema I use for my jobs is like this:

解决方案

Database INSERT OR UPDATE methods are incredibly costly as the database cannot batch all of the commits to do all at once and must do them line by line (ACID transactions force this because if it attempted to do an insert and then failed then all of the other records in this commit would also fail).

Instead, for large bulk operations it is always best to predetermine whether a record would be inserted or updated before passing the commit to the database and then sending 2 transactions to the database.

A typical job that needed this functionality would assemble the data that is to be INSERT OR UPDATEd and then query the database table for the existing primary keys. If the primary key already exists then you can send this as an UPDATE, otherwise it is an INSERT. The logic for this can be easily done in a tMap component.

In this job we have some data that we wish to INSERT OR UPDATE into a database table that contains some pre-existing data:

And we wish to add the following data to it:

The job works by throwing the new data into a tHashOutput component so it can be used multiple times in the same job (it simply puts it to memory or in large instances can cache it to the disk).

Following on from this one lot of data is read out of a tHashInput component and directly into a tMap. Another tHashInput component is utilised to run a parameterised query against the table:

You may find this guide to Talend and parameterised queries useful. From here the returned records (so only the ones inside the database already) are used as a lookup to the tMap.

This is then configured as an INNER JOIN to find the records that need to be UPDATED with the rejects from the INNER JOIN to be inserted:

These outputs then just flow to separate tMySQLOutput components to UPDATE or INSERT as necessary. And finally when the main subjob is complete we commit the changes.

这篇关于如何使用talend和sql server更快地加载数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆