PySpark:将 SchemaRDD 映射到 SchemaRDD [英] PySpark: Map a SchemaRDD into a SchemaRDD

查看：24 发布时间：2021/11/14 22:01:07 apache-spark hive pyspark apache-spark-sql rdd

本文介绍了PySpark:将 SchemaRDD 映射到 SchemaRDD的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在加载一个 JSON 对象文件作为 PySpark SchemaRDD.我想改变对象的形状"(基本上，我将它们展平)，然后插入到 Hive 表中.

I am loading a file of JSON objects as a PySpark SchemaRDD. I want to change the "shape" of the objects (basically, I'm flattening them) and then insert into a Hive table.

我遇到的问题是以下返回一个 PipelinedRDD 而不是 SchemaRDD:

The problem I have is that the following returns a PipelinedRDD not a SchemaRDD:

log_json.map(flatten_function)

(其中 log_json 是 SchemaRDD).

有没有办法保留类型，转换回所需的类型，或者有效地从新类型插入?

Is there either a way to preserve type, cast back to the desired type, or efficiently insert from the new type?

推荐答案

解决方案是applySchema:

mapped = log_json.map(flatten_function)
hive_context.applySchema(mapped, flat_schema).insertInto(name)

其中 flat_schema 是一个 StructType，以与您从 log_json.schema() 获得的相同方式表示架构(但显然是扁平化的).

Where flat_schema is a StructType representing the schema in the same way as you would obtain from log_json.schema() (but flattened, obviously).

这篇关于PySpark:将 SchemaRDD 映射到 SchemaRDD的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PySpark:将 SchemaRDD 映射到 SchemaRDD [英] PySpark: Map a SchemaRDD into a SchemaRDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PySpark:将 SchemaRDD 映射到 SchemaRDD [英] PySpark: Map a SchemaRDD into a SchemaRDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭