为什么ExecuteSQLRecord需要很长时间才能开始在大型表上输出流文件? [英] Why is ExecuteSQLRecord taking a long time to start outputting flow files on large tables?

查看:238
本文介绍了为什么ExecuteSQLRecord需要很长时间才能开始在大型表上输出流文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ExecuteSQLRecord处理器转储具有100+百万条记录的大型表(100 GB)的内容.

I am using the ExecuteSQLRecord processor to dump the contents of a large table (100 GB) with 100+ million records.

我已经设置了如下属性.但是,我要注意的是,要花45分钟才能看到任何流文件从此处理器中流出?

I have set up the properties like below. However, what I am noticing is that it takes a good 45 minutes before I see any flow files coming out of this processor?

我想念什么?

我正在使用NiFi 1.9.1

I am on NiFi 1.9.1

谢谢.

推荐答案

ExecuteSQL(Record)甚至GenerateTableFetch-> ExecuteSQL(Record)的替代方法是使用不带最大值列的QueryDatabaseTable.它具有Fetch Size属性,该属性尝试设置每次从数据库中提取请求时返回的行数.例如,Oracle的默认值为10 ,因此每个流文件有10000行,ExecuteSQL必须进行1000次到数据库的访问,一次获取10行.我建议通常将获取大小"设置为每个流文件最大行数",然后对每个传出的流文件进行一次获取.

An alternative to ExecuteSQL(Record) or even GenerateTableFetch -> ExecuteSQL(Record) is to use QueryDatabaseTable without a Max-Value Column. It has a Fetch Size property that attempts to set the number of rows returned on each pull from the database. Oracle's default is 10 for example, so with 10000 rows per flow file, ExecuteSQL has to make 1000 trips to the DB, fetching 10 rows at a time. I recommend setting Fetch Size to Max Rows Per Flow File as a general rule, then one fetch is made per outgoing flow file.

Fetch Size属性也应该对ExecuteSQL处理器可用,我写了Apache Jira

The Fetch Size property should be available to the ExecuteSQL processors as well, I wrote up Apache Jira NIFI-6865 to cover this improvement.

这篇关于为什么ExecuteSQLRecord需要很长时间才能开始在大型表上输出流文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆