Spark Scala:将任意 N 列转换为 Map [英] Spark Scala: convert arbitrary N columns into Map
问题描述
我有以下数据结构,表示电影 ID(第一列)和其余列中不同用户对该电影的评分 - 类似这样:
I have the following data structure representing movie ids (first column) and ratings for different users for that movie in the rest of columns - something like that:
+-------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
|movieId| 1| 2| 3| 4| 5| 6| 7| 8| 9| 10| 11| 12| 13| 14| 15|
+-------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| 1580|null|null| 3.5| 5.0|null|null|null|null|null|null|null|null|null|null|null|
| 3175|null|null|null|null|null|null|null|null|null|null|null|null|null| 5.0|null|
| 3794|null|null|null|null|null|null|null|null|null|null|null| 3.0|null|null|null|
| 2659|null|null|null| 3.0|null|null|null|null|null|null|null|null|null|null|null|
我想把这个DataFrame转成一个DataSet
I want to convert this DataFrame to a DataSet of
final case class MovieRatings(movie_id: Long, rating: Map[Long, Double])
所以它会是这样的
[1580, [1 -> null, 2 -> null, 3 -> 3.5, 4 -> 5.0, 5 -> null, 6 -> null, 7 -> null,...]]
等等
如何做到这一点?
这里的问题是用户数量是任意的.我想将它们压缩到一个单独的列中,使第一列保持不变.
The thing here is that number of users is arbitrary. And I want to zip those into a single column leaving the first column untouched.
推荐答案
首先,您必须将 DataFrame 转换为与您的案例类匹配的架构,然后您可以使用 .as[MovieRatings]
将 DataFrame 转换为 Dataset[MovieRatings]
:
First, you have to tranform your DataFrame into one with a schema matching your case class, then you can use .as[MovieRatings]
to convert DataFrame into a Dataset[MovieRatings]
:
import org.apache.spark.sql.functions._
import spark.implicits._
// define a new MapType column using `functions.map`, passing a flattened-list of
// column name (as a Long column) and column value
val mapColumn: Column = map(df.columns.tail.flatMap(name => Seq(lit(name.toLong), $"$name")): _*)
// select movie id and map column with names matching the case class, and convert to Dataset:
df.select($"movieId" as "movie_id", mapColumn as "ratings")
.as[MovieRatings]
.show(false)
这篇关于Spark Scala:将任意 N 列转换为 Map的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!