阿帕奇星火JavaSchemaRDD是空的,即使输入RDD到有数据 [英] Apache Spark JavaSchemaRDD is empty even though input RDD to it has data
问题描述
您好我有标签的大型无分隔的文件超过40列。我想在这仅选择几列应用聚合。我认为Apache Spark是帮助我的文件存储在Hadoop中的最佳人选。我有以下程序
Hi I have large no of tab delimited files with over 40 columns. I want to apply aggregation on it select only few columns. I think Apache Spark is the best candidate to help as my files are stored in Hadoop. I have the following program
public class MyPOJO {
int field1;
String field2; etc
}
JavaSparkContext sc;
JavaRDD<String> data = sc.textFile("path/input.csv");
JavaSQLContext sqlContext = new JavaSQLContext(sc);
JavaRDD<Record> rdd_records = sc.textFile(data).map(
new Function<String, Record>() {
public Record call(String line) throws Exception {
String[] fields = line.split(",");
MyPOJO sd = new MyPOJO(fields[0], fields[1], fields[2], fields[3]);
return sd;
}
});
以上code运行正常,当我申请行动 rdd_record.saveAsTextFile(/到/ Hadoop的/);
我可以看到它创建了部分00000文件, RDD输出。但是,当我试着做以下
Above code runs fine when I apply action rdd_record.saveAsTextFile("/to/hadoop/");
I can see it creates part-00000 file with RDD output. But when I tried to do the following
JavaSchemaRDD table = sqlContext.applySchema(rdd_records, MyPojo.class);
table.printSchema(); //prints just root and empty lines
table.saveAsTextFile("/to/hadoop/path");//prints part file with [] for each line
我不知道哪里出了问题MyPojo.class拥有所有的领域,为什么JavaSchemaRDD空,部分文件打印什么。请指导我是新来的火花。先谢谢了。
I dont know where is the problem MyPojo.class has all the fields why is JavaSchemaRDD empty and prints nothing in part file. Please guide I am new to Spark. Thanks in advance.
推荐答案
Accoding星火的文档当我加入的getter / setter的所有领域和实施序列化
接口MyPojo类它开始工作和 JavaSchemaRDD
已包含数据。
Accoding to Spark documentation when I added getter/setter for all the fields and implemented Serializable
interface to MyPojo class it started working and JavaSchemaRDD
was containing data.
public class MyPOJO implements Serializable {
private int field1;
private String field2;
public int getField1() {
returns field1;
}
public void setField1(int field1) {
this.field1 = field1;
}
public String getField2() {
return field2;
}
public void setField1(String field2) {
this.field2 = field2;
}
}
这篇关于阿帕奇星火JavaSchemaRDD是空的,即使输入RDD到有数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!