如何在火花图函数中输出多个(键,值) [英] how to output multiple (key,value) in spark map function
本文介绍了如何在火花图函数中输出多个(键,值)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
输入数据格式如下:
+--------------------+-------------+--------------------+
| StudentID| Right | Wrong |
+--------------------+-------------+--------------------+
| studentNo01 | a,b,c | x,y,z |
+--------------------+-------------+--------------------+
| studentNo02 | c,d | v,w |
+--------------------+-------------+--------------------+
输出格式如下():
+--------------------+---------+
| key | value|
+--------------------+---------+
| studentNo01,a | 1 |
+--------------------+---------+
| studentNo01,b | 1 |
+--------------------+---------+
| studentNo01,c | 1 |
+--------------------+---------+
| studentNo01,x | 0 |
+--------------------+---------+
| studentNo01,y | 0 |
+--------------------+---------+
| studentNo01,z | 0 |
+--------------------+---------+
| studentNo02,c | 1 |
+--------------------+---------+
| studentNo02,d | 1 |
+--------------------+---------+
| studentNo02,v | 0 |
+--------------------+---------+
| studentNo02,w | 0 |
+--------------------+---------+
正确的意思是1,错误的意思是0.
The Right means 1 , The Wrong means 0.
我想用Spark map函数或者udf来处理这些数据,但是不知道怎么处理.你能帮我吗?谢谢.
I want to process these data using Spark map function or udf, But I don't know how to deal with it . Can you help me, please? Thank you.
推荐答案
使用split和explode两次并进行union
Use split and explode twice and do the union
val df = List(
("studentNo01","a,b,c","x,y,z"),
("studentNo02","c,d","v,w")
).toDF("StudenID","Right","Wrong")
+-----------+-----+-----+
| StudenID|Right|Wrong|
+-----------+-----+-----+
|studentNo01|a,b,c|x,y,z|
|studentNo02| c,d| v,w|
+-----------+-----+-----+
val pair = (
df.select('StudenID,explode(split('Right,",")))
.select(concat_ws(",",'StudenID,'col).as("key"))
.withColumn("value",lit(1))
).unionAll(
df.select('StudenID,explode(split('Wrong,",")))
.select(concat_ws(",",'StudenID,'col).as("key"))
.withColumn("value",lit(0))
)
+-------------+-----+
| key|value|
+-------------+-----+
|studentNo01,a| 1|
|studentNo01,b| 1|
|studentNo01,c| 1|
|studentNo02,c| 1|
|studentNo02,d| 1|
|studentNo01,x| 0|
|studentNo01,y| 0|
|studentNo01,z| 0|
|studentNo02,v| 0|
|studentNo02,w| 0|
+-------------+-----+
可以如下转换为RDD
val rdd = pair.map(r => (r.getString(0),r.getInt(1)))
这篇关于如何在火花图函数中输出多个(键,值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文