如何在pyspark中将行转换为字典列表? [英] How to convert rows into a list of dictionaries in pyspark?

查看：99 发布时间：2021/11/14 21:35:00 apache-spark pyspark apache-spark-sql

本文介绍了如何在pyspark中将行转换为字典列表?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 pyspark 中有一个 DataFrame(df)，通过从 hive 表中读取:

I have a DataFrame(df) in pyspark, by reading from a hive table:

df=spark.sql('select * from <table_name>')


+++++++++++++++++++++++++++++++++++++++++++
|  Name    |    URL visited               |
+++++++++++++++++++++++++++++++++++++++++++
|  person1 | [google,msn,yahoo]           |
|  person2 | [fb.com,airbnb,wired.com]    |
|  person3 | [fb.com,google.com]          |
+++++++++++++++++++++++++++++++++++++++++++

当我尝试以下操作时，出现错误

When i tried the following, got an error

df_dict = dict(zip(df['name'],df['url']))
"TypeError: zip argument #1 must support iteration."

type(df.name) 是 'pyspark.sql.column.Column'

我如何创建一个像下面这样的字典，以后可以迭代

How do i create a dictionary like the following, which can be iterated later on

{'person1':'google','msn','yahoo'}
{'person2':'fb.com','airbnb','wired.com'}
{'person3':'fb.com','google.com'}

感谢您的想法和帮助.

推荐答案

我觉得你可以试试row.asDict()，这段代码直接在executor上运行，不用收集司机数据.

I think you can try row.asDict(), this code run directly on the executor, and you don't have to collect the data on driver.

类似于:

df.rdd.map(lambda row: row.asDict())

这篇关于如何在pyspark中将行转换为字典列表?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在pyspark中将行转换为字典列表? [英] How to convert rows into a list of dictionaries in pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在pyspark中将行转换为字典列表? [英] How to convert rows into a list of dictionaries in pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭