Spark map / Filter抛出java.io.IOException：换行之前的字节太多：2147483648 [英] Spark map/Filter throws java.io.IOException: Too many bytes before newline: 2147483648

查看：1717 发布时间：2018/6/1 12:48:36 scala hadoop apache-spark hdfs

本文介绍了Spark map / Filter抛出java.io.IOException：换行之前的字节太多：2147483648的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个简单的文件，大小为7 GB，其中包含由| .I分隔的两列的每行都创建了此文件的RDD，但是当我在此RDD上使用映射或过滤器转换时，我得到的太多字节异常。 / b>

下面是我的文件中的示例数据。

116010100000000007 | 33448

116010100000000014 | 13520

116010100000000021 | 97132

116010100000000049 | 82891

116010100000000049 | 82890

116010100000000056 | 93014

116010100000000063 | 43434 p>

116010100000000063 | 43434

这里是代码

  val input = sparkContext.textFile（hdfsfilePath）; 
 
 input.filter（x => x.split（|）（1）.toInt> 15000）.saveAsTextFile（hdfs：//输出文件路径）

以下是我得到的异常。

  java.io.IOException：换行之前的字节太多：2147483648 
 at org.apache.hadoop.util.LineReader.readDefaultLine（LineReader.java:249）
 at org。 apache.hadoop.util.LineReader.readLine（LineReader.java:174）
 at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine（UncompressedSplitLineReader.java:94）
 at org。 apache.hadoop.mapred.LineRecordReader。< init>（LineRecordReader.java:136）

解决方案

问题在于我的scala代码，同时用管道分隔符分隔线，我更改了代码，现在它正在工作。
以下是改变的代码。

  val input = sparkContext.textFile（hdfsfilePath）; 
 
 input.filter（x => x.split（'|'）（1）.toInt> 15000）.saveAsTextFile（hdfs：//输出文件路径）

代替|我需要使用|或\\ |在拆分方法。

I am having a simple file of size 7 GB in which each line containing two column separated by |.I have created RDD from this file but when i use map or filter transformation on this RDD i gets too many byte exception.

below is sample data from my file .

116010100000000007|33448

116010100000000014|13520

116010100000000021|97132

116010100000000049|82891

116010100000000049|82890

116010100000000056|93014

116010100000000063|43434

here is the code

val input = sparkContext.textFile("hdfsfilePath");

input.filter(x=>x.split("|")(1).toInt > 15000).saveAsTextFile("hdfs://output file path")

Below is the Exception i am getting .

java.io.IOException: Too many bytes before newline: 2147483648
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:249)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:136)

解决方案

Issue was with my scala code while splitting line with pipe delimiter ,i have changed the code and now it is working. below is changed code .

          val input = sparkContext.textFile("hdfsfilePath");

          input.filter(x=>x.split('|')(1).toInt > 15000).saveAsTextFile("hdfs://output file path")

instead of "|" i neeed to use either '|' or "\\|" in split method.

这篇关于Spark map / Filter抛出java.io.IOException：换行之前的字节太多：2147483648的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark map / Filter抛出java.io.IOException：换行之前的字节太多：2147483648 [英] Spark map/Filter throws java.io.IOException: Too many bytes before newline: 2147483648

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Spark map / Filter抛出java.io.IOException：换行之前的字节太多：2147483648 [英] Spark map/Filter throws java.io.IOException: Too many bytes before newline: 2147483648

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭