如何将printSchema的结果保存到PySpark中的文件 [英] How to save result of printSchema to a file in PySpark

查看：569 发布时间：2020/9/4 8:12:01 python apache-spark pyspark

本文介绍了如何将printSchema的结果保存到PySpark中的文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在pyspark中使用了df.printSchema()，它为我提供了具有树结构的架构.现在，我需要将其保存在变量或文本文件中.

I have used df.printSchema() in pyspark and it gives me the schema with tree structure. Now i need to save it in a variable or a text file.

我尝试了以下保存方法，但是它们没有用.

I have tried below methods of saving but they didn't work.

v = str(df.printSchema())  
print(v) 
#and
df.printSchema().saveAsTextFile(<path>)

我需要以下格式的已保存架构

I need the saved schema in below format

|-- COVERSHEET: struct (nullable = true)                              
 |    |-- ADDRESSES: struct (nullable = true)
 |    |    |-- ADDRESS: struct (nullable = true)
 |    |    |    |-- _VALUE: string (nullable = true)
 |    |    |    |-- _city: string (nullable = true)
 |    |    |    |-- _primary: long (nullable = true)
 |    |    |    |-- _state: string (nullable = true)
 |    |    |    |-- _street: string (nullable = true)
 |    |    |    |-- _type: string (nullable = true)
 |    |    |    |-- _zip: long (nullable = true)
 |    |-- CONTACTS: struct (nullable = true)
 |    |    |-- CONTACT: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- _VALUE: string (nullable = true)
 |    |    |    |    |-- _name: string (nullable = true)
 |    |    |    |    |-- _type: string (nullable = true)

推荐答案

您需要treeString(出于某种原因，我在python API中找不到)

You need treeString (which for some reason, I couldn't find in the python API)

#v will be a string
v = df._jdf.schema().treeString()

您可以将其转换为RDD并使用saveAsTextFile

You can convert it to a RDD and use saveAsTextFile

sc.parallelize([v]).saveAsTextFile(...)

或者使用Python特定的API将字符串写入文件.

Or use Python specific API to write a String to a file.

这篇关于如何将printSchema的结果保存到PySpark中的文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将printSchema的结果保存到PySpark中的文件 [英] How to save result of printSchema to a file in PySpark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将printSchema的结果保存到PySpark中的文件 [英] How to save result of printSchema to a file in PySpark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭