使用Spring Batch从HDFS读取文件 [英] Reading file from HDFS using Spring batch
本文介绍了使用Spring Batch从HDFS读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我必须编写一个Spring批处理,该批处理将从HDFS中读取文件并更新MySQL DB中的数据.
I've to write a Spring batch which will read a file from HDFS and will update the data in MySQL DB.
HDFS中的源文件包含一些CSV格式的报告数据.
The source file in HDFS contains some report data, in CSV format.
有人可以指出我一个从HDFS读取文件的示例吗?
Can someone point me to an example of reading a file from HDFS?
谢谢.
推荐答案
Spring Batch中的FlatFileItemReader
可与任何Spring Framework Resource
实现一起使用:
The FlatFileItemReader
in Spring Batch works with any Spring Framework Resource
implementation:
@Bean
public FlatFileItemReader<String> itemReader() {
Resource resource; // get (or autowire) resource
return new FlatFileItemReaderBuilder<String>()
.resource(resource)
// set other reader properties
.build();
}
因此,如果您设法使Resource
句柄指向HDFS文件,那么您就完成了.
So if you manage to have a Resource
handle pointing to a HDFS file, your are done.
现在,要拥有HDFS资源,您可以:
Now in order to have a HDFS resource, you can:
- 使用 Spring for Hadoop .一旦配置了HDFS文件系统,您就可以使用
applicationContext.getResource("hdfs:data.csv");
从应用程序上下文中获取资源.
- 使用Hadoop API来实现自己的
Resource
(如Michael Simons的答案所示).我看到有些人已经做到了
- Use Spring for Hadoop. Once the HDFS file system is configured, you would be able to get the resource from the application context with
applicationContext.getResource("hdfs:data.csv");
- Implement your own
Resource
using Hadoop APIs (like shown in the answer by Michael Simons). I see that some folks already did this here
希望这会有所帮助.
这篇关于使用Spring Batch从HDFS读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文