在pyspark中列出DataFrame [英] List to DataFrame in pyspark

查看：23 发布时间：2021/11/14 21:44:55 pyspark pyspark-sql

本文介绍了在pyspark中列出DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

谁能告诉我如何将包含字符串的列表转换为 pyspark 中的 Dataframe.我使用 python 3.6 和 spark 2.2.1.我刚开始学习火花环境，我的数据如下

Can someone tell me how to convert a list containing strings to a Dataframe in pyspark. I am using python 3.6 with spark 2.2.1. I am just started learning spark environment and my data looks like below

my_data =[['apple','ball','ballon'],['cat','camel','james'],['none','focus','cake']]

现在，我想创建一个 Dataframe 如下

Now, i want to create a Dataframe as follows

---------------------------------
|ID | words                     |
---------------------------------
 1  | ['apple','ball','ballon'] |
 2  | ['cat','camel','james']   |

我什至想添加数据中没有关联的 ID 列

I even want to add ID column which is not associated in the data

推荐答案

您可以将列表转换为 Row 对象列表，然后使用 spark.createDataFrame 从您的数据推断架构:

You can convert the list to a list of Row objects, then use spark.createDataFrame which will infer the schema from your data:

from pyspark.sql import Row
R = Row('ID', 'words')

# use enumerate to add the ID column
spark.createDataFrame([R(i, x) for i, x in enumerate(my_data)]).show() 
+---+--------------------+
| ID|               words|
+---+--------------------+
|  0|[apple, ball, bal...|
|  1| [cat, camel, james]|
|  2| [none, focus, cake]|
+---+--------------------+

这篇关于在pyspark中列出DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在pyspark中列出DataFrame [英] List to DataFrame in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在pyspark中列出DataFrame [英] List to DataFrame in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭