pyspark - 在地图类型结构中创建 DataFrame 分组列 [英] pyspark - create DataFrame Grouping columns in map type structure

查看：19 发布时间：2021/11/14 22:24:43 python sql dictionary pyspark spark-dataframe

本文介绍了pyspark - 在地图类型结构中创建 DataFrame 分组列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的DataFrame 具有以下结构:

-------------------------
| Brand | type |  amount|
-------------------------
|  B   |   a  |   10   |
|  B   |   b  |   20   |
|  C   |   c  |   30   |
-------------------------

我想通过将 type 和 amount 分组为一列 type 来减少行数:Map所以 Brand 将是唯一的，MAP_type_AMOUNT 将为每个 type amount 有 key,value> 组合.


I want to reduce the amount of rows by grouping type and amount into one single column of type: Map
So Brand will be unique and MAP_type_AMOUNT will have key,value for each type amount combination.
我认为 Spark.sql 可能有一些函数可以帮助完成这个过程，或者我是否必须让 RDD 成为 DataFrame 并自己"转换为映射类型?
I think Spark.sql might have some functions to help in this process, or do I have to get the RDD being the DataFrame and make my "own" conversion to map type?
预期:
   -------------------------
    | Brand | MAP_type_AMOUNT 
    -------------------------
    |  B    | {a: 10, b:20} |
    |  C    | {c: 30}       |
    -------------------------


推荐答案
对 Prem 的 答案(抱歉我还不能评论)
Slight improvement to Prem's answer (sorry I can't comment yet)
使用 func.create_map 而不是 func.struct.请参阅文档
Use func.create_map instead of func.struct. See documentation
import pyspark.sql.functions as func
df = sc.parallelize([('B','a',10),('B','b',20),
('C','c',30)]).toDF(['Brand','Type','Amount'])

df_converted = df.groupBy("Brand").\
    agg(func.collect_list(func.create_map(func.col("Type"),
    func.col("Amount"))).alias("MAP_type_AMOUNT"))

print df_converted.collect()

输出:
[Row(Brand=u'B', MAP_type_AMOUNT=[{u'a': 10}, {u'b': 20}]),
 Row(Brand=u'C', MAP_type_AMOUNT=[{u'c': 30}])]


                        这篇关于pyspark - 在地图类型结构中创建 DataFrame 分组列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

pyspark - 在地图类型结构中创建 DataFrame 分组列 [英] pyspark - create DataFrame Grouping columns in map type structure

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pyspark - 在地图类型结构中创建 DataFrame 分组列 [英] pyspark - create DataFrame Grouping columns in map type structure

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭