Spark Scala 2.10 元组限制 [英] Spark Scala 2.10 tuple limit

查看:13
本文介绍了Spark Scala 2.10 元组限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 66 列要处理的 DataFrame(几乎每个列的值都需要以某种方式更改)所以我正在运行以下语句

I have DataFrame with 66 columns to process (almost each column value needs to be changed someway) so I'm running following statement

    val result = data.map(row=> (
        modify(row.getString(row.fieldIndex("XX"))),
        (...)
        )
    )

直到第 66 列.由于此版本中的 scala 限制为 22 对的最大元组,我不能那样执行此操作.问题是,有什么解决方法吗?在所有行操作之后,我将其转换为具有特定列名的 df

till 66th column. Since scala in this version has limit to max tuple of 22 pairs I cannot perform this like that. Question is, is there any workaround for it? After all line operations I'm converting it to df with specific column names

   result.toDf("c1",...,"c66")
   result.storeAsTempTable("someFancyResult")

修改"功能只是说明我的观点的一个例子

"modify" function is just an example to show my point

推荐答案

如果您所做的只是修改现有 DataFrame 中的值,最好使用 UDF 而不是映射到 RDD:

If all you do is modifying values from an existing DataFrame it is better to use an UDF instead of mapping over a RDD:

import org.apache.spark.sql.functions.udf

val modifyUdf = udf(modify)
data.withColumn("c1", modifyUdf($"c1"))

如果由于某种原因上述不符合您的需求,您可以做的最简单的事情是从 RDD[Row] 重新创建DataFrame.比如这样:

If for some reason above doesn't fit your needs the simplest thing you can do is to recreateDataFrame from a RDD[Row]. for example like this:

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField, StructType, IntegerType}


val result: RDD[Row] = data.map(row => {
  val buffer = ArrayBuffer.empty[Any]

  // Add value to buffer
  buffer.append(modify(row.getAs[String]("c1")))

  // ... repeat for other values

  // Build row
  Row.fromSeq(buffer)
})

// Create schema
val schema = StructType(Seq(
  StructField("c1", StringType, false),
  // ...  
  StructField("c66", StringType, false)
))

sqlContext.createDataFrame(result, schema)

这篇关于Spark Scala 2.10 元组限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆