阿帕奇星火JavaSchemaRDD是空的,即使输入RDD到有数据 [英] Apache Spark JavaSchemaRDD is empty even though input RDD to it has data

查看:178
本文介绍了阿帕奇星火JavaSchemaRDD是空的,即使输入RDD到有数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好我有标签的大型无分隔的文件超过40列。我想在这仅选择几列应用聚合。我认为Apache Spark是帮助我的文件存储在Hadoop中的最佳人选。我有以下程序

Hi I have large no of tab delimited files with over 40 columns. I want to apply aggregation on it select only few columns. I think Apache Spark is the best candidate to help as my files are stored in Hadoop. I have the following program

public class MyPOJO {
int field1;
String field2; etc
}

JavaSparkContext sc;
JavaRDD<String> data = sc.textFile("path/input.csv");
JavaSQLContext sqlContext = new JavaSQLContext(sc);

JavaRDD<Record> rdd_records = sc.textFile(data).map(
  new Function<String, Record>() {
      public Record call(String line) throws Exception {
         String[] fields = line.split(",");
         MyPOJO sd = new MyPOJO(fields[0], fields[1], fields[2], fields[3]);
         return sd;
      }
});

以上code运行正常,当我申请行动 rdd_record.saveAsTextFile(/到/ Hadoop的/); 我可以看到它创建了部分00000文件, RDD输出。但是,当我试着做以下

Above code runs fine when I apply action rdd_record.saveAsTextFile("/to/hadoop/"); I can see it creates part-00000 file with RDD output. But when I tried to do the following

JavaSchemaRDD table = sqlContext.applySchema(rdd_records, MyPojo.class);
table.printSchema(); //prints just root and empty lines
table.saveAsTextFile("/to/hadoop/path");//prints part file with [] for each line

我不知道哪里出了问题MyPojo.class拥有所有的领域,为什么JavaSchemaRDD空,部分文件打印什么。请指导我是新来的火花。先谢谢了。

I dont know where is the problem MyPojo.class has all the fields why is JavaSchemaRDD empty and prints nothing in part file. Please guide I am new to Spark. Thanks in advance.

推荐答案

Accoding星火的文档当我加入的getter / setter的所有领域和实施序列化接口MyPojo类它开始工作和 JavaSchemaRDD 已包含数据。

Accoding to Spark documentation when I added getter/setter for all the fields and implemented Serializable interface to MyPojo class it started working and JavaSchemaRDD was containing data.

public class MyPOJO implements Serializable {
    private int field1;
    private String field2;
    public int getField1() {
       returns field1;
    }
    public void setField1(int field1) {
       this.field1 = field1;
    }
    public String getField2() {
       return field2;
    }
    public void setField1(String field2) {
       this.field2 = field2;
    }
    }

这篇关于阿帕奇星火JavaSchemaRDD是空的,即使输入RDD到有数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆