Spark,Java中的DataFrame转换 [英] DataFrame transformation in Spark, Java
本文介绍了Spark,Java中的DataFrame转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在我加载带有以下内容的json文件之后:
After I load a json file with:
df = sqlContext.read().json(path);
我将使用Java Spark获得DataFrame.例如,我有下一个DF:
I will get my DataFrame in Java Spark. I have for example the next DF:
id item1 item2 item3 ....
id1 0 3 4
id2 1 0 2
id3 3 3 0
...
I want to transform it in the most easy way to (probably of Object of the class Rating, id and item then to Integer by .hashCode())
id item ranking
id1 item1 0
id1 item2 3
id1 item3 4
....
id2 item1 1
id2 item2 0
id1 item1 2
...
PS首先尝试创建flatMap函数:
PS Some first attempt to create the flatMap function:
void transformTracks() {
JavaRDD<Rating> = df.flatMap(new Function<Row, Rating>(){
public Rating call(Row r) {
for (String i : r) {
return Rating(1, 1, r.apply(Double.parseDouble(i)));
}
}
})
}
推荐答案
如果语法略有偏离,您必须原谅我-我现在在Scala中编程,自从我使用Java已经有一段时间了-但类似的东西:
You have to forgive me if the syntax is slightly off - I program in Scala nowadays and it's been a while since I used Java - but something along the lines of:
DataFrame df = sqlContext.read().json(path);
String[] columnNames = df.columns;
DataFrame newDF = df.flatMap(row -> {
ArrayList list = new ArrayList<>(columnNames.length);
String id = (String)row.get(0);
for (int i = 1; i < columnNames.length, i++) {
list.add(id, columnNames[i], (int)row.get(i));
}
return list;
}).toDF("id", "item", "ranking");
这篇关于Spark,Java中的DataFrame转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文