如何读取Hadoop的（HDFS）第一行的文件有效地使用Java？ [英] How to read first line in Hadoop (HDFS) file efficiently using Java?

查看：1267 发布时间：2016/5/22 16:18:49 java csv hadoop apache-spark

本文介绍了如何读取Hadoop的（HDFS）第一行的文件有效地使用Java？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有我的Hadoop集群在一个大的CSV文件。该文件的第一行是一个标题线，它由字段名称。我想这个标题行做一次手术，但我不想处理整个文件。另外，我的计划是用Java编写和使用的火花。

I have a large CSV file on my Hadoop cluster. The first line of the file is a 'header' line, which consists of field names. I want to do an operation on this header line, but I do not want to process the whole file. Also, my program is written in Java and using Spark.

什么是阅读只是一个大的CSV文件的第一行的Hadoop集群上的一个有效的方式？

What is an efficient way to read just the first line of a large CSV file on an Hadoop cluster?

You can access hdfs files with FileSystem class and friends:

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.DistributedFileSystem;

DistributedFileSystem fileSystem = new DistributedFileSystem();
Configuration conf = new Configuration();
fileSystem.initialize(new URI("hdfs://namenode-host:54310"), conf);
FSDataInputStream input = fileSystem.open(new Path("/path/to/file.csv"));
System.out.println((new BufferedReader(new InputStreamReader(input))).readLine());

这code不会用麻preduce，将与一个合理的速度运行。

This code won't use MapReduce and will run with a reasonable speed.

这篇关于如何读取Hadoop的（HDFS）第一行的文件有效地使用Java？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何读取Hadoop的（HDFS）第一行的文件有效地使用Java？ [英] How to read first line in Hadoop (HDFS) file efficiently using Java?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何读取Hadoop的（HDFS）第一行的文件有效地使用Java？ [英] How to read first line in Hadoop (HDFS) file efficiently using Java?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭