ule子批量提交并记录失败 [英] Mule batch commit and records failures

查看:79
本文介绍了ule子批量提交并记录失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我当前的情况:

我有 10000条记录作为批次输入. 根据我的理解,批处理仅用于逐条记录的处理.因此,我正在批处理步骤中使用dataweave组件转换每个记录(注意:我没有使用任何批处理提交)并将每个记录写入文件.进行逐条记录处理的原因是假设在任何特定记录中,有一个无效数据,只有该特定记录失败了,其余记录才能得到很好的处理.

I have 10000 records as input to batch. As per my understanding, batch is only for record-by-record processing.Hence, i am transforming each record using dataweave component inside batch step(Note: I havenot used any batch-commit) and writing each record to file. The reason for doing record-by-record processing is suppose in any particular record, there is an invalid data, only that particular record gets failed, rest of them will be processed fine.

但是在我看到的许多博客中,它们都使用带有数据编织组件的 batchcommit(带有流).因此,根据我的理解,所有记录将一次提交到dataweave中,如果其中一条记录具有无效数据,则所有10000条记录都将失败(在dataweave处).然后,丢失了逐记录处理的要点. 以上假设是正确的还是我的思维方式错误?

But in many of the blogs I see, they are using a batchcommit(with streaming) with dataweave component. So as per my understanding, all the records will be given in one shot to dataweave, and if one record has invalid data, all the 10000 records will get failed(at dataweave). Then, the point of record-by-record processing is lost. Is the above assumption correct or am I thinking wrong way??

这就是我不使用批处理提交的原因.

That is the reason I am not using batch Commit.

现在,正如我所说的,将每条记录发送到一个文件中.实际上,我确实需要将每条记录发送到 5个不同的CSV文件.因此,目前,我正在我的 BatchStep 中使用 Scatter-Gather 组件将其发送到五条不同的路由.

Now, as I said am sending each record to a file. Actually, i do have the requirement of sending each record to 5 different CSV files. So, currently I am using Scatter-Gather component inside my BatchStep to send it to five different routes.

作为,您可以看到图像.输入阶段将提供10000条记录的集合.每条记录将使用Scatter-Gather发送到5条路由.

As, you can see the image. the input phase gives a collection of 10000 records. Each record will be send to 5 routes using Scatter-Gather.

是的,我使用的方法很好,还是可以遵循任何更好的设计?

Is, the approach I am using is it fine, or any better Design can be followed??

此外,我还创建了第二个批处理步骤,以捕获仅失败的记录..但是,对于当前的设计,我无法捕获失败的记录.

Also, I have created a 2nd Batch step, to capture ONLY FAILEDRECORDS. But, with the current Design, I am not able to Capture failed records.

推荐答案

缩短答案

以上假设是正确的还是我的思维方式错误?

Is the above assumption correct or am I thinking wrong way??

简而言之,是的,您的想法是错误的.阅读我的示例说明,以了解原因,希望您会感激.

In short, yes you are thinking the wrong way. Read my loooong explanation with example to understand why, hope you will appreciate it.

此外,我还创建了第二个批处理步骤,以仅捕获失败的记录. 但是,使用当前的设计,我无法捕获失败的记录.

Also, I have created a 2nd Batch step, to capture ONLY FAILEDRECORDS. But, with the current Design, I am not able to Capture failed records.

您可能忘记了在批处理作业上设置max-failed-records = "-1"(无限制).默认值为0,在第一个失败的记录批次上将返回并且不执行后续步骤.

You probably forget to set max-failed-records = "-1" (unlimited) on batch job. Default is 0, on first failed record batch will return and not execute subsequent steps.

是的,我使用的方法很好,或者可以使用任何更好的设计 关注了吗?

Is, the approach I am using is it fine, or any better Design can be followed??

我认为如果性能对您来说至关重要,那么您就无法解决按顺序执行此操作所产生的开销. 相反,如果您可以放慢速度,可以在5个不同的步骤中执行此操作,那么您将失去并行性,但可以更好地控制失败的记录,尤其是在使用批处理提交的情况下.

I think it makes sense if performance is essential for you and you can't cope with the overhead created by doing this operation in sequence. If instead you can slow down a bit it could make sense to do this operation in 5 different steps, you will loose parallelism but you can have a better control on failing records especially if using batch commit.

我认为通过一个例子来解释它是如何工作的最好方法.

I think the best way to explain how it works it trough an example.

请考虑以下情况: 您已使用max-failed-records = "-1"(无限制)配置了批处理.

Take in consideration the following case: You have a batch processing configured with max-failed-records = "-1" (no limit).

<batch:job name="batch_testBatch" max-failed-records="-1">

在此过程中,我们输入由6个字符串组成的集合.

In this process we input a collection composed by 6 strings.

 <batch:input>
            <set-payload value="#[['record1','record2','record3','record4','record5','record6']]" doc:name="Set Payload"/>
 </batch:input>

处理过程由3个步骤组成" 第一步只是对处理进行记录,而第二步将进行记录,并在record3上引发异常以模拟失败.

The processing is composed by 3 steps" The first step is just a logging of the processing and the second step will instead do a logging and throw an exception on record3 to simulate a failure.

<batch:step name="Batch_Step">
        <logger message="-- processing #[payload] in step 1 --" level="INFO" doc:name="Logger"/>
 </batch:step>
 <batch:step name="Batch_Step2">
     <logger message="-- processing #[payload] in step 2 --" level="INFO" doc:name="Logger"/>
     <scripting:transformer doc:name="Groovy">
         <scripting:script engine="Groovy"><![CDATA[
         if(payload=="record3"){
             throw new java.lang.Exception();
         }
         payload;
         ]]>
         </scripting:script>
     </scripting:transformer>
</batch:step>

第三步将只包含提交计数值为2的提交.

The third step will instead contain just the commit with a commit count value of 2.

<batch:step name="Batch_Step3">
    <batch:commit size="2" doc:name="Batch Commit">
        <logger message="-- committing #[payload] --" level="INFO" doc:name="Logger"/>
    </batch:commit>
</batch:step>

现在您可以跟随我进行此批处理:

Now you can follow me in the execution of this batch processing:

在开始时,所有6条记录都将由第一步处理,并且登录控制台将如下所示:

On start all 6 records will be processed by the first step and logging in console would look like this:

 -- processing record1 in step 1 --
 -- processing record2 in step 1 --
 -- processing record3 in step 1 --
 -- processing record4 in step 1 --
 -- processing record5 in step 1 --
 -- processing record6 in step 1 --
Step Batch_Step finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch

现在,在步骤2上会变得更有趣的是,记录3将失败,因为我们明确抛出了一个异常,但是尽管如此,该步骤仍将继续处理其他记录,这里是日志的样子.

Now things would be more interesting on step 2 the record 3 will fail because we explicitly throw an exception but despite this the step will continue in processing the other records, here how the log would look like.

-- processing record1 in step 2 --
-- processing record2 in step 2 --
-- processing record3 in step 2 --
com.mulesoft.module.batch.DefaultBatchStep: Found exception processing record on step ...
Stacktrace
....
-- processing record4 in step 2 --
-- processing record5 in step 2 --
-- processing record6 in step 2 --
Step Batch_Step2 finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch

这时尽管在此步骤中记录失败,但由于参数max-failed-records设置为-1(无限制)而不是默认值0,所以批处理将继续进行.

At this point despite a failed record in this step batch processing will continue because the parameter max-failed-records is set to -1 (unlimited) and not to the default value of 0.

这时,所有成功的记录将传递到step3,因为默认情况下,步骤的accept-policy参数设置为NO_FAILURES. (其他可能的值为ALLONLY_FAILURES).

At this point all the successful records will be passed to step3, this because, by default, the accept-policy parameter of a step is set to NO_FAILURES. (Other possible values are ALL and ONLY_FAILURES).

现在,包含提交阶段且计数等于2的step3将把记录两两提交:

Now the step3 that contains the commit phase with a count equal to 2 will commit the records two by two:

-- committing [record1, record2] --
-- committing [record4, record5] --
Step: Step Batch_Step3 finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch
-- committing [record6] --

如您所见,这确认失败的record3没有传递到下一步,因此没有提交.

As you can see this confirms that record3 that was in failure was not passed to the next step and therefore not committed.

从该示例开始,我认为您可以想象和测试更复杂的方案,例如,在提交之后,您可以采取另一步,该步骤仅处理失败的记录,以使知道的管理员使用失败的邮件. 在您始终可以使用外部存储存储有关记录的更多高级信息之后,您可以在

Starting from this example I think you can imagine and test more complex scenario, for example after commit you could have another step that process only failed records for make aware administrator with a mail of the failure. After you can always use external storage to store more advanced info about your records as you can read in my answer to this other question.

希望这会有所帮助

这篇关于ule子批量提交并记录失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆