Pyspark 将标准列表转换为数据框 [英] Pyspark convert a standard list to data frame

查看：37 发布时间：2021/11/14 21:39:41 python apache-spark pyspark pyspark-sql

本文介绍了Pyspark 将标准列表转换为数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

案例很简单，我需要使用以下代码将python列表转换为数据框

The case is really simple, I need to convert a python list into data frame with following code

from pyspark.sql.types import StructType
from pyspark.sql.types import StructField
from pyspark.sql.types import StringType, IntegerType

schema = StructType([StructField("value", IntegerType(), True)])
my_list = [1, 2, 3, 4]
rdd = sc.parallelize(my_list)
df = sqlContext.createDataFrame(rdd, schema)

df.show()

失败，错误如下:

    raise TypeError("StructType can not accept object %r in type %s" % (obj, type(obj)))
TypeError: StructType can not accept object 1 in type <class 'int'>

推荐答案

这个解决方案也是一种使用更少代码的方法，避免序列化到 RDD 并且可能更容易理解:

This solution is also an approach that uses less code, avoids serialization to RDD and is likely easier to understand:

from pyspark.sql.types import IntegerType

# notice the variable name (more below)
mylist = [1, 2, 3, 4]

# notice the parens after the type name
spark.createDataFrame(mylist, IntegerType()).show()

注意:关于命名变量 list:术语 list 是 Python 内置函数，因此，强烈建议我们避免使用内置名称作为名称/label 用于我们的变量，因为我们最终会覆盖诸如 list() 函数之类的东西.在为快速而肮脏的东西制作原型时，许多人使用类似的东西:mylist.

NOTE: About naming your variable list: the term list is a Python builtin function and as such, it is strongly recommended that we avoid using builtin names as the name/label for our variables because we end up overwriting things like the list() function. When prototyping something fast and dirty, a number of folks use something like: mylist.

这篇关于Pyspark 将标准列表转换为数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pyspark 将标准列表转换为数据框 [英] Pyspark convert a standard list to data frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pyspark 将标准列表转换为数据框 [英] Pyspark convert a standard list to data frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭