spark-scala:不是org.apache.spark.sql.Row的成员 [英] spark - scala: not a member of org.apache.spark.sql.Row

查看：518 发布时间：2020/9/4 19:42:13 scala apache-spark apache-spark-sql rdd spark-dataframe

本文介绍了spark-scala:不是org.apache.spark.sql.Row的成员的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将数据帧转换为RDD，然后执行以下一些操作以返回元组:

I am trying to convert a data frame to RDD, then perform some operations below to return tuples:

df.rdd.map { t=>
 (t._2 + "_" + t._3 , t)
}.take(5)

然后我得到了下面的错误.有人有想法么?谢谢！

Then I got the error below. Anyone have any ideas? Thanks!

<console>:37: error: value _2 is not a member of org.apache.spark.sql.Row
               (t._2 + "_" + t._3 , t)
                  ^

推荐答案

将DataFrame转换为RDD时，会得到一个RDD[Row]，因此当您使用map时，您的函数会收到一个Row作为参数.因此，必须使用Row方法访问其成员(请注意，索引从0开始):

When you convert a DataFrame to RDD, you get an RDD[Row], so when you use map, your function receives a Row as parameter. Therefore, you must use the Row methods to access its members (note that the index starts from 0):

df.rdd.map { 
  row: Row => (row.getString(1) + "_" + row.getString(2), row)
}.take(5)

您可以在编辑:我不知道执行此操作的原因，但是对于连接DataFrame的String列，您可以考虑以下选项:

I don't know the reason why you are doing this operation, but for concatenating String columns of a DataFrame you may consider the following option:

import org.apache.spark.sql.functions._
val newDF = df.withColumn("concat", concat(df("col2"), lit("_"), df("col3")))

这篇关于spark-scala:不是org.apache.spark.sql.Row的成员的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

spark-scala:不是org.apache.spark.sql.Row的成员 [英] spark - scala: not a member of org.apache.spark.sql.Row

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

spark-scala:不是org.apache.spark.sql.Row的成员 [英] spark - scala: not a member of org.apache.spark.sql.Row

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭