如何读取星火多行的元素呢？ [英] How to read multiple line elements in Spark?

查看：230 发布时间：2016/5/22 16:30:49 hadoop apache-spark hdfs

本文介绍了如何读取星火多行的元素呢？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当您阅读使用星火文件中的 sc.textfile ，它给你的元素，每个元素都是一个独立的行。不过，我想每个元素包括N多的行。无论是因为没有在那个文件我不能使用分隔符。那么，如何才能让火花给我多行的元素呢？

When you read a file in Spark using sc.textfile, it gives you elements, where each element is a separate line. However, I want each element to consist of N number of lines. I can't use delimiters either because there is none in that file. So, how can I make spark give me multiple line elements?

和我感兴趣的是这样使用的 NLineInputFormat 类。这有可能在星火这样做呢？我可以看到这样的例子对于马云preduce，但我没有任何线索，将如何转化为Spark中。

And I'm interested in doing so using the NLineInputFormat class. Is that possible to do so in Spark? I can see examples of that for MapReduce, but I don't have any clue how that would translate to in Spark.

推荐答案

是的，如果你是从Hadoop的获取文件。你应该能够做到这一点是这样的：

Yes, if you are getting the files from hadoop. You should be able to do it like this:

val records = sc.newAPIHadoopRDD(hadoopConf,classOf[NLineInputFormat],classOf[LongWritable],classOf[Text])

这里的<一个href=\"https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#newAPIHadoopRDD(org.apache.hadoop.conf.Configuration,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class)\"相对=nofollow> API文档。

这篇关于如何读取星火多行的元素呢？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何读取星火多行的元素呢？ [英] How to read multiple line elements in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何读取星火多行的元素呢？ [英] How to read multiple line elements in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭