创建一个涉及 ArrayType 的 Pyspark Schema [英] Creating a Pyspark Schema involving an ArrayType

查看:59
本文介绍了创建一个涉及 ArrayType 的 Pyspark Schema的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为我的新 DataFrame 创建一个架构,并尝试了各种括号和关键字的组合,但一直无法弄清楚如何进行这项工作.我目前的尝试:

from pyspark.sql.types import *架构 = 结构类型([StructField("用户", IntegerType()),数组类型(结构类型([StructField("user", StringType()),StructField("product", StringType()),StructField("评级", DoubleType())]))])

返回错误:

elementType 应该是 DataType回溯(最近一次调用最后一次):文件/usr/hdp/current/spark2-client/python/pyspark/sql/types.py",第 290 行,在 __init__ 中assert isinstance(elementType, DataType), "elementType 应该是 DataType"断言错误:元素类型应该是数据类型

我用谷歌搜索过,但到目前为止还没有关于对象数组的好例子.

解决方案

对于 ArrayType 属性,您将需要一个额外的 StructField.这个应该可以工作:

from pyspark.sql.types import *架构 = 结构类型([StructField("用户", IntegerType()),StructField("My_array", ArrayType(结构类型([StructField("user", StringType()),StructField("product", StringType()),StructField("评级", DoubleType())]))])

欲了解更多信息,请查看此链接:http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/>

I'm trying to create a schema for my new DataFrame and have tried various combinations of brackets and keywords but have been unable to figure out how to make this work. My current attempt:

from pyspark.sql.types import *

schema = StructType([
  StructField("User", IntegerType()),
  ArrayType(StructType([
    StructField("user", StringType()),
    StructField("product", StringType()),
    StructField("rating", DoubleType())]))
  ])

Comes back with the error:

elementType should be DataType
Traceback (most recent call last):
 File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__
assert isinstance(elementType, DataType), "elementType should be DataType"
AssertionError: elementType should be DataType   

I have googled, but so far no good examples of an array of objects.

解决方案

You will need an additional StructField for ArrayType property. This one should work:

from pyspark.sql.types import *

schema = StructType([
  StructField("User", IntegerType()),
  StructField("My_array", ArrayType(
      StructType([
          StructField("user", StringType()),
          StructField("product", StringType()),
          StructField("rating", DoubleType())
      ])
   )
])

For more information check this link: http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/

这篇关于创建一个涉及 ArrayType 的 Pyspark Schema的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆