创建一个涉及 ArrayType 的 Pyspark Schema [英] Creating a Pyspark Schema involving an ArrayType
问题描述
我正在尝试为我的新 DataFrame 创建一个架构,并尝试了各种括号和关键字的组合,但一直无法弄清楚如何进行这项工作.我目前的尝试:
from pyspark.sql.types import *架构 = 结构类型([StructField("用户", IntegerType()),数组类型(结构类型([StructField("user", StringType()),StructField("product", StringType()),StructField("评级", DoubleType())]))])
返回错误:
elementType 应该是 DataType回溯(最近一次调用最后一次):文件/usr/hdp/current/spark2-client/python/pyspark/sql/types.py",第 290 行,在 __init__ 中assert isinstance(elementType, DataType), "elementType 应该是 DataType"断言错误:元素类型应该是数据类型
我用谷歌搜索过,但到目前为止还没有关于对象数组的好例子.
对于 ArrayType
属性,您将需要一个额外的 StructField
.这个应该可以工作:
from pyspark.sql.types import *架构 = 结构类型([StructField("用户", IntegerType()),StructField("My_array", ArrayType(结构类型([StructField("user", StringType()),StructField("product", StringType()),StructField("评级", DoubleType())]))])
欲了解更多信息,请查看此链接:http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/>
I'm trying to create a schema for my new DataFrame and have tried various combinations of brackets and keywords but have been unable to figure out how to make this work. My current attempt:
from pyspark.sql.types import *
schema = StructType([
StructField("User", IntegerType()),
ArrayType(StructType([
StructField("user", StringType()),
StructField("product", StringType()),
StructField("rating", DoubleType())]))
])
Comes back with the error:
elementType should be DataType
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__
assert isinstance(elementType, DataType), "elementType should be DataType"
AssertionError: elementType should be DataType
I have googled, but so far no good examples of an array of objects.
You will need an additional StructField
for ArrayType
property. This one should work:
from pyspark.sql.types import *
schema = StructType([
StructField("User", IntegerType()),
StructField("My_array", ArrayType(
StructType([
StructField("user", StringType()),
StructField("product", StringType()),
StructField("rating", DoubleType())
])
)
])
For more information check this link: http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/
这篇关于创建一个涉及 ArrayType 的 Pyspark Schema的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!