Pyspark：映射一个SchemaRDD成SchemaRDD [英] Pyspark: Map a SchemaRDD into a SchemaRDD

查看：818 发布时间：2016/5/22 15:45:16 apache-spark hive pyspark pyspark-sql

本文介绍了Pyspark：映射一个SchemaRDD成SchemaRDD的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我加载JSON对象的文件作为pyspark SchemaRDD 。我想改变的对象（基本上，我压扁他们）的形状，然后插入到蜂巢表。

I am loading a file of JSON objects as a pyspark SchemaRDD. I want to change the "shape" of the objects (basically, I'm flattening them) and then insert into a Hive table.

我的问题是，下面的返回 PipelinedRDD 不是 SchemaRDD ：

The problem I have is that the following returns a PipelinedRDD not a SchemaRDD:

log_json.map(flatten_function)

（这里的 log_json 是 SchemaRDD ）。

是否有任一种方法来preserve型，强制转换回所需的类型，或有效地从新型插入

Is there either a way to preserve type, cast back to the desired type, or efficiently insert from the new type?

推荐答案

解决方案是 applySchema ：

mapped = log_json.map(flatten_function)
hive_context.applySchema(mapped, flat_schema).insertInto(name)

在哪里flat_schema是 StructType 重新presenting架构以同样的方式，你会从获得log_json.schema（）（但夷为平地，很明显）。

Where flat_schema is a StructType representing the schema in the same way as you would obtain from log_json.schema() (but flattened, obviously).

这篇关于Pyspark：映射一个SchemaRDD成SchemaRDD的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pyspark：映射一个SchemaRDD成SchemaRDD [英] Pyspark: Map a SchemaRDD into a SchemaRDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Pyspark：映射一个SchemaRDD成SchemaRDD [英] Pyspark: Map a SchemaRDD into a SchemaRDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭