创建一个涉及ArrayType的Pyspark模式 [英] Creating a Pyspark Schema involving an ArrayType
问题描述
我正在尝试为新的DataFrame创建架构,并尝试了方括号和关键字的各种组合,但是无法弄清楚如何实现此目的.我目前的尝试:
from pyspark.sql.types import *
schema = StructType([
StructField("User", IntegerType()),
ArrayType(StructType([
StructField("user", StringType()),
StructField("product", StringType()),
StructField("rating", DoubleType())]))
])
返回错误:
elementType should be DataType
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__
assert isinstance(elementType, DataType), "elementType should be DataType"
AssertionError: elementType should be DataType
我已经用Google搜索过,但是到目前为止,还没有很好的对象数组示例.
对于ArrayType
属性,您将需要一个附加的StructField
.这个应该可以工作:
from pyspark.sql.types import *
schema = StructType([
StructField("User", IntegerType()),
StructField("My_array", ArrayType(
StructType([
StructField("user", StringType()),
StructField("product", StringType()),
StructField("rating", DoubleType())
])
)
])
有关更多信息,请检查此链接: Comes back with the error: I have googled, but so far no good examples of an array of objects. You will need an additional For more information check this link: http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/ 这篇关于创建一个涉及ArrayType的Pyspark模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!elementType should be DataType
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__
assert isinstance(elementType, DataType), "elementType should be DataType"
AssertionError: elementType should be DataType
StructField
for ArrayType
property. This one should work:from pyspark.sql.types import *
schema = StructType([
StructField("User", IntegerType()),
StructField("My_array", ArrayType(
StructType([
StructField("user", StringType()),
StructField("product", StringType()),
StructField("rating", DoubleType())
])
)
])