创建一个涉及ArrayType的Pyspark模式 [英] Creating a Pyspark Schema involving an ArrayType

查看:74
本文介绍了创建一个涉及ArrayType的Pyspark模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为新的DataFrame创建架构,并尝试了方括号和关键字的各种组合,但是无法弄清楚如何实现此目的.我目前的尝试:

from pyspark.sql.types import *

schema = StructType([
  StructField("User", IntegerType()),
  ArrayType(StructType([
    StructField("user", StringType()),
    StructField("product", StringType()),
    StructField("rating", DoubleType())]))
  ])

返回错误:

elementType should be DataType
Traceback (most recent call last):
 File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__
assert isinstance(elementType, DataType), "elementType should be DataType"
AssertionError: elementType should be DataType   

我已经用Google搜索过,但是到目前为止,还没有很好的对象数组示例.

解决方案

对于ArrayType属性,您将需要一个附加的StructField.这个应该可以工作:

from pyspark.sql.types import *

schema = StructType([
  StructField("User", IntegerType()),
  StructField("My_array", ArrayType(
      StructType([
          StructField("user", StringType()),
          StructField("product", StringType()),
          StructField("rating", DoubleType())
      ])
   )
])

有关更多信息,请检查此链接:

Comes back with the error:

elementType should be DataType
Traceback (most recent call last):
 File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__
assert isinstance(elementType, DataType), "elementType should be DataType"
AssertionError: elementType should be DataType   

I have googled, but so far no good examples of an array of objects.

You will need an additional StructField for ArrayType property. This one should work:

from pyspark.sql.types import *

schema = StructType([
  StructField("User", IntegerType()),
  StructField("My_array", ArrayType(
      StructType([
          StructField("user", StringType()),
          StructField("product", StringType()),
          StructField("rating", DoubleType())
      ])
   )
])

For more information check this link: http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/

这篇关于创建一个涉及ArrayType的Pyspark模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆