数据列表中每个项目的春季批重复步骤 [英] Spring Batch-Repeat step for each item in a data list
问题描述
这是一个艰难的过程,但我相信这并非闻所未闻.
This is a tough one, but I am sure it is not unheard of.
我有两个数据集,国家"和人口统计".国家/地区数据集包含一个国家/地区的名称以及其人口统计数据的ID.
I have two datasets, Countries and Demographics. The countries dataset contains the name of a country and an ID to it's Demographic data.
人口统计数据集是一个从国家到郊区的分层数据集.
The demographic dataset is a hierarchal dataset starting from the country down to the suburb.
这两个数据集每周都要从第三方收集.
Both of these datasets are pulled from a 3rd party on a weekly basis.
我需要将受众特征分成文件,每个国家/地区一个.
I need to split the demographics out into files, one for each country.
到目前为止,我所采取的步骤是 1)拉国家 2)拉客层 3)(这是必需的)循环遍历国家/地区数据集,调用将国家/地区人口统计信息写入文件"
So far the steps that i have are 1) Pull Countries 2) Pull Demographics 3) (this is needed) Loop over the country dataset calling a "Write Country Demographics to File"
是否可以以某种方式重复通过当前国家/地区ID的步骤?
Is it possible to somehow repeat a step passing the current country id?
添加了指向PartitionHandler示例的链接
感谢JBristow.下面的链接显示了使用JavaTask对象的addArgument重写PartitionHandler来传递参数的方法,但是开发人员看起来很繁重,而且不是非常特定于业务问题的",这是Spring批处理的目标. http://www .activeeon.com/blog/all/integration/distribute-a-spring-batch-job-on-the-proactive-scheduler
Thanks JBristow. The below link shows the use of overriding the PartitionHandler to pass parameters using the addArgument of a JavaTask object, but it looks like a lot of heavy lifting by the developer and not very "business problem specific" which is the goal of Spring batch. http://www.activeeon.com/blog/all/integration/distribute-a-spring-batch-job-on-the-proactive-scheduler
我还在您的原始链接的第7.4.3节中看到了.将输入数据绑定到步骤是在7.4.2的上下文中.分区程序,这看起来非常令人兴奋
I also saw in your original link section 7.4.3. Binding Input Data to Steps this is in the context of 7.4.2. Partitioner, this looks very exciting
<bean id="itemReader" scope="step"
class="org.spr...MultiResourceItemReader">
<property name="resource" value="#{stepExecutionContext[fileName]}/*"/>
</bean>
我不认为任何人都可以在其中进行一些示例XML配置吗?
I don's supose that anyone has some sample XML config of this in play?
- 分区程序
- 将动态值传递给分区中的步骤
谢谢.
推荐答案
是的,请查看spring-batch的分区功能! http://static.springsource.org/spring-batch/reference/html-single/index.html#partitioning
Yes, check out the partitioning feature of spring-batch! http://static.springsource.org/spring-batch/reference/html-single/index.html#partitioning
基本上,它允许您使用分区程序"来创建新的执行上下文,以传递给处理程序,然后该处理程序对该信息执行某些操作.
Basically, it allows you to use a "partitioner" to create new execution contexts to pass to a handler that then does something with that information.
虽然分区是为了并行化而设计的,但是其默认并发性是1,因此您可以从小处开始并逐步增加以匹配您所使用的硬件.由于我假设每个国家/地区的数据均不依赖其他国家/地区(至少在下载人口统计步骤中),因此您的工作可以利用基本并行化.
While partitioning was made for parallelization, its default concurrency is 1, so you can start small and ratchet it up to match the hardware at your disposal. Since I assume that each country's data is not dependent on the others (at least in the download demographics step), your job could make use of basic parallelization.
/添加示例.
这是我的工作(或多或少): 首先,XML:
Here's what I do (more or less): First, the XML:
<beans>
<batch:job id="jobName">
<batch:step id="innerStep.master">
<batch:partition partitioner="myPartitioner" step="innerStep"/>
</batch:step>
</batch:job>
<bean id="myPartitioner" class="org.lapseda.MyPartitioner" scope="step">
<property name="jdbcTemplate" ref="jdbcTemplate"/>
<property name="runDate" value="#{jobExecutionContext['runDate']}"/>
<property name="recurrenceId" value="D"/>
</bean>
<batch:step id="summaryDetailsReportStep">
<batch:tasklet>
<batch:chunk reader="someReader" processor="someProcessor" writer="someWriter" commit-interval="10"/>
</batch:tasklet>
</batch:step>
</beans>
现在有一些Java:
public class MyPartitioner implements Partitioner {
@Override
public Map<String, ExecutionContext> partition(int gridSize) {
List<String> list = getValuesToRunOver();
/* I use treemap because my partitions are ordered, hashmap should work if order isn't important */
Map<String, ExecutionContext> out = new TreeMap<String, ExecutionContext>();
for (String item : list) {
ExecutionContext context = new ExecutionContext();
context.put("key", "value"); // add your own stuff!
out.put("innerStep"+item, context);
}
return out;
}
}
然后,您只需从上下文中读取内容,就像从常规步骤或步骤中的工作上下文中读取内容一样.
Then you just read from the context like you would from a normal step or job context inside your step.
这篇关于数据列表中每个项目的春季批重复步骤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!