使用Spring Batch从HDFS读取文件 [英] Reading file from HDFS using Spring batch

查看:286
本文介绍了使用Spring Batch从HDFS读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须编写一个Spring批处理,该批处理将从HDFS中读取文件并更新MySQL DB中的数据.

I've to write a Spring batch which will read a file from HDFS and will update the data in MySQL DB.

HDFS中的源文件包含一些CSV格式的报告数据.

The source file in HDFS contains some report data, in CSV format.

有人可以指出我一个从HDFS读取文件的示例吗?

Can someone point me to an example of reading a file from HDFS?

谢谢.

推荐答案

Spring Batch中的FlatFileItemReader可与任何Spring Framework Resource实现一起使用:

The FlatFileItemReader in Spring Batch works with any Spring Framework Resource implementation:

@Bean
public FlatFileItemReader<String> itemReader() {
    Resource resource; // get (or autowire) resource
    return new FlatFileItemReaderBuilder<String>()
            .resource(resource)
            // set other reader properties
            .build();
}

因此,如果您设法使Resource句柄指向HDFS文件,那么您就完成了.

So if you manage to have a Resource handle pointing to a HDFS file, your are done.

现在,要拥有HDFS资源,您可以:

Now in order to have a HDFS resource, you can:

  • 使用 Spring for Hadoop .一旦配置了HDFS文件系统,您就可以使用applicationContext.getResource("hdfs:data.csv");
  • 从应用程序上下文中获取资源.
  • 使用Hadoop API来实现自己的Resource(如Michael Simons的答案所示).我看到有些人已经做到了
  • Use Spring for Hadoop. Once the HDFS file system is configured, you would be able to get the resource from the application context with applicationContext.getResource("hdfs:data.csv");
  • Implement your own Resource using Hadoop APIs (like shown in the answer by Michael Simons). I see that some folks already did this here

希望这会有所帮助.

这篇关于使用Spring Batch从HDFS读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆