如何在Spark中连续引入架构? [英] how to introduce the schema in a Row in Spark?
本文介绍了如何在Spark中连续引入架构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在Row Java API中,存在row.schema(),但是没有row.set(StructType模式).
In the Row Java API there is a row.schema(), however there is not a row.set(StructType schema).
我也尝试了RowFactorie.create(objets),但是我不知道如何进行
Also I tried to RowFactorie.create(objets), but I don't know how to proceed
更新:
问题在于,当我修改示例中的worker时,如何生成新的数据框
The problems is how to generate a new dataframe when I modify the structure in workers I put the example
DataFrame sentenceData = jsql.createDataFrame(jrdd, schema);
List<Row> resultRows2 = sentenceData.toJavaRDD()
.map(new MyFunction<Row, Row>(parameters) {
/** my map function **//
public Row call(Row row) {
// I want to change Row definition adding new columns
Row newRow = functionAddnewNewColumns (row);
StructType newSchema = functionGetNewSchema (row.schema);
// Here I want to insert the structure
//
return newRow
}
}
}).collect();
JavaRDD<Row> jrdd = jsc.parallelize(resultRows);
// Here is the problema I don't know how to get the new schema to create the new modified dataframe
DataFrame newDataframe = jsql.createDataFrame(jrdd, newSchema);
推荐答案
您可以通过以下方式使用Schema创建行:
You can create a row with Schema by using:
Row newRow = new GenericRowWithSchema(values, newSchema);
这篇关于如何在Spark中连续引入架构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文