的文本文件转换为序列格式的Java星火 [英] Convert a text file to sequence format in Spark Java

查看:547
本文介绍了的文本文件转换为序列格式的Java星火的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在星火Java的,我怎么一个文本文件转换成一个序列文件?以下是我的code:

In Spark Java, how do I convert a text file to a sequence file? The following is my code:

    SparkConf sparkConf = new SparkConf().setAppName("txt2seq");
    sparkConf.setMaster("local").set("spark.executor.memory", "1g");
    sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    JavaSparkContext ctx = new JavaSparkContext(sparkConf);

    JavaPairRDD<String, String> infile = ctx.wholeTextFiles("input_txt");
    infile.saveAsNewAPIHadoopFile("outfile.seq", String.class, String.class, SequenceFileOutputFormat.class);


和我得到了下面的错误。


And I got the error below.

14/12/07 23:43:33 ERROR Executor: Exception in task ID 0
java.io.IOException: Could not find a serializer for the Key class: 'java.lang.String'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1176)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1091)

有没有人有任何想法?谢谢!

Does anyone have any idea? Thank you!

推荐答案

更改此:

JavaPairRDD<String, String> infile = ctx.wholeTextFiles("input_txt");
infile.saveAsNewAPIHadoopFile("outfile.seq", String.class, String.class, SequenceFileOutputFormat.class);

JavaPairRDD<String, String> infile = ctx.wholeTextFiles("input_txt");
JavaPairRDD<Text, Text> resultRDD = infile.mapToPair(f -> new Tuple2<>(new Text(f._1()), new Text(f._2())));
resultRDD.saveAsNewAPIHadoopFile("outfile.seq", Text.class, Text.class, SequenceFileOutputFormat.class);

这篇关于的文本文件转换为序列格式的Java星火的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆