spark-scala:不是org.apache.spark.sql.Row的成员 [英] spark - scala: not a member of org.apache.spark.sql.Row
问题描述
我正在尝试将数据帧转换为RDD,然后执行以下一些操作以返回元组:
I am trying to convert a data frame to RDD, then perform some operations below to return tuples:
df.rdd.map { t=>
(t._2 + "_" + t._3 , t)
}.take(5)
然后我得到了下面的错误.有人有想法么?谢谢!
Then I got the error below. Anyone have any ideas? Thanks!
<console>:37: error: value _2 is not a member of org.apache.spark.sql.Row
(t._2 + "_" + t._3 , t)
^
推荐答案
将DataFrame转换为RDD时,会得到一个RDD[Row]
,因此当您使用map
时,您的函数会收到一个Row
作为参数.因此,必须使用Row
方法访问其成员(请注意,索引从0开始):
When you convert a DataFrame to RDD, you get an RDD[Row]
, so when you use map
, your function receives a Row
as parameter. Therefore, you must use the Row
methods to access its members (note that the index starts from 0):
df.rdd.map {
row: Row => (row.getString(1) + "_" + row.getString(2), row)
}.take(5)
您可以在 编辑:我不知道执行此操作的原因,但是对于连接DataFrame的String列,您可以考虑以下选项:
I don't know the reason why you are doing this operation, but for concatenating String columns of a DataFrame you may consider the following option:
import org.apache.spark.sql.functions._
val newDF = df.withColumn("concat", concat(df("col2"), lit("_"), df("col3")))
这篇关于spark-scala:不是org.apache.spark.sql.Row的成员的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!