如何在pyspark中将行转换为Dictionary? [英] How to convert rows into Dictionary in pyspark?
本文介绍了如何在pyspark中将行转换为Dictionary?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
通过从配置单元表中读取,我在pyspark中有一个DataFrame(df):
I have a DataFrame(df) in pyspark, by reading from a hive table:
df=spark.sql('select * from <table_name>')
+++++++++++++++++++++++++++++++++++++++++++
| Name | URL visited |
+++++++++++++++++++++++++++++++++++++++++++
| person1 | [google,msn,yahoo] |
| person2 | [fb.com,airbnb,wired.com] |
| person3 | [fb.com,google.com] |
+++++++++++++++++++++++++++++++++++++++++++
当我尝试以下操作时,出现错误
When i tried the following, got an error
df_dict = dict(zip(df['name'],df['url']))
"TypeError: zip argument #1 must support iteration."
type(df.name) is of 'pyspark.sql.column.Column'
我如何创建类似以下的字典,以后可以对其进行迭代
How do i create a dictionary like the following, which can be iterated later on
{'person1':'google','msn','yahoo'}
{'person2':'fb.com','airbnb','wired.com'}
{'person3':'fb.com','google.com'}
感谢您的想法和帮助.
Appreciate your thoughts and help.
推荐答案
我认为您可以尝试row.asDict()
,此代码直接在执行程序上运行,而您不必在驱动程序上收集数据.
I think you can try row.asDict()
, this code run directly on the executor, and you don't have to collect the data on driver.
类似的东西:
df.rdd.map(lambda row: row.asDict())
这篇关于如何在pyspark中将行转换为Dictionary?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文