在使用“ Insert / Update”时,数据加载很慢。踏入五角大楼 [英] Data loading is slow while using "Insert/Update" step in pentaho

查看:937
本文介绍了在使用“ Insert / Update”时,数据加载很慢。踏入五角大楼的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在pentaho 4.4.0中使用插入/更新步骤时,数据加载缓慢。

Data loading is slow while using "Insert/Update" step in pentaho 4.4.0

我正在使用pentaho 4.4.0。在水壶中使用插入/更新步骤时,与mysql相比,数据加载的速度太慢。此步骤将在插入之前扫描表中的整个记录​​。如果记录存在,它将进行更新。因此,在执行插入/更新时应执行哪些操作以优化性能。并且处理速度为4 r / s,所以我的记录总计将超过10万……。该过程需要2个半小时才能完成整个过程。

I am using pentaho 4.4.0. While using the "Insert/Update" step in kettle the speed of the data load is too slow compared to mysql. This step will scan through the entire records in table before inserting. If the record exist it will do a update. So what shall be done to optimize the performance while doing "Insert/Update" . and the process speed is 4 r/s, so totally my records will be above 1 lakh... The process takes 2 and half hours to complete the entire process.

推荐答案

根据您的评论,听起来您想执行合并行(差异)步骤,然后执行后同步合并。查看 Pentaho Wiki ,以了解这些步骤的工作原理。

Based on your comments it sounds like you want the Merge rows (diff) step followed by a Synchronize after merge. Check the Pentaho wiki to see how these steps work.

另一个有很大不同的地方是,有多少行导致upsert与有多少总行。如果导致写入的行数超过40%,则@carexcer的最新评论可能是更好的方法。如果更少,请绝对尝试合并行(差异)步骤。

Another thing that makes a big difference is how many of the rows result in an upsert vs how many total rows. If the number of rows resulting in writes is more than roughly 40%, @carexcer's last comment may be a better approach. If it's less, definitely try the Merge rows (diff) step.

4-每秒25行慢。确保您标记为键的字段已索引,无论您选择哪个步骤。

4 - 25 rows per second sounds way slow. Be sure the fields you marked as keys are indexed, whichever step you choose.

如果大多数行都导致upsert,则最好进行完全刷新。如果是这种情况,请查看MySQL批量加载程序。 Pentaho同时具有批处理和流式批量加载器,尽管我不知道它们有多好。

If most of the rows result in an upsert, you may be better off with a full refresh. If that's the case, check out the MySQL bulk loaders. Pentaho has both a batch and streaming bulk loader, though I don't know how good they are.

这篇关于在使用“ Insert / Update”时,数据加载很慢。踏入五角大楼的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆