如何在Spark中连续引入架构? [英] how to introduce the schema in a Row in Spark?

查看:59
本文介绍了如何在Spark中连续引入架构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Row Java API中,存在row.schema(),但是没有row.set(StructType模式).

In the Row Java API there is a row.schema(), however there is not a row.set(StructType schema).

我也尝试了RowFactorie.create(objets),但是我不知道如何进行

Also I tried to RowFactorie.create(objets), but I don't know how to proceed

更新:

问题在于,当我修改示例中的worker时,如何生成新的数据框

The problems is how to generate a new dataframe when I modify the structure in workers I put the example

DataFrame sentenceData = jsql.createDataFrame(jrdd, schema);
List<Row> resultRows2 = sentenceData.toJavaRDD()
            .map(new MyFunction<Row, Row>(parameters) {
            /** my map function **// 

                public Row call(Row row) {

                 // I want to change Row definition adding new columns
                    Row newRow = functionAddnewNewColumns (row);
                    StructType newSchema = functionGetNewSchema (row.schema);

                    // Here I want to insert the structure 

                    //
                    return newRow
                    }

                }

        }).collect();


JavaRDD<Row> jrdd = jsc.parallelize(resultRows);

// Here is the problema  I don't know how to get the new schema to create the   new modified dataframe

DataFrame newDataframe = jsql.createDataFrame(jrdd, newSchema);

推荐答案

您可以通过以下方式使用Schema创建行:

You can create a row with Schema by using:

Row newRow = new GenericRowWithSchema(values, newSchema);

这篇关于如何在Spark中连续引入架构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆