Spark,Java中的DataFrame转换 [英] DataFrame transformation in Spark, Java

查看:130
本文介绍了Spark,Java中的DataFrame转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我加载带有以下内容的json文件之后:

After I load a json file with:

df = sqlContext.read().json(path);

我将使用Java Spark获得DataFrame.例如,我有下一个DF:

I will get my DataFrame in Java Spark. I have for example the next DF:

id item1 item2 item3 ....
id1    0     3     4
id2    1     0     2
id3    3     3     0
...

我想以最简单的方式将其转换为(可能是

I want to transform it in the most easy way to (probably of Object of the class Rating, id and item then to Integer by .hashCode())

id   item   ranking
id1  item1    0
id1  item2    3
id1  item3    4
....
id2  item1    1
id2  item2    0
id1  item1    2
...

PS首先尝试创建flatMap函数:

PS Some first attempt to create the flatMap function:

void transformTracks() {
        JavaRDD<Rating> = df.flatMap(new Function<Row, Rating>(){
            public Rating call(Row r) {
                for (String i : r) {
                    return Rating(1, 1, r.apply(Double.parseDouble(i)));
                }
            }
        })
    }

推荐答案

如果语法略有偏离,您必须原谅我-我现在在Scala中编程,自从我使用Java已经有一段时间了-但类似的东西:

You have to forgive me if the syntax is slightly off - I program in Scala nowadays and it's been a while since I used Java - but something along the lines of:

DataFrame df = sqlContext.read().json(path);
String[] columnNames = df.columns;

DataFrame newDF = df.flatMap(row -> {
  ArrayList list = new ArrayList<>(columnNames.length);
  String id = (String)row.get(0);

  for (int i = 1; i < columnNames.length, i++) {
    list.add(id, columnNames[i], (int)row.get(i));
  }
  return list;
}).toDF("id", "item", "ranking");

这篇关于Spark,Java中的DataFrame转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆