转换为数据框错误 [英] Convert to dataframe error
问题描述
我想制作一个包含110列的数据框,所以当我尝试将rdd转换为数据框时,我创建了一个具有110个属性的类.
I want to make a dataframe with 110 columns, so i create a class with 110 attributes when i try to convert the rdd to dataframe.
case class Myclass(var cin_nb:String,...........,var last:String)
import sqlContext.implicts._
file2.map(_.split("\t")).map(a=>Myclass(a(0),a(1),a(2),a(3),.....a(110)).ToDf()
我收到此错误:
not enough arguments for method apply: (cin_nb: String,...........,last:String)
我正在使用scala和spark 1.6.谢谢
i'm using scala and spark 1.6. Thank you
推荐答案
您不能执行此操作,因为对于案例类/StructType模式,存在22列的硬限制.这是由于scala中的元组仅支持22个元素!要将数据框扩展到更多列,您需要使用.withColumn
函数将其扩展,或直接从文件加载到数据框.例如,从拼花地板或使用databricks csv解析器.
You can't do this because there is a hard limit of 22 columns with case classes / StructType schemas. This is due to the Tuple in scala only supporting 22 elements!! To grow a dataframe to more columns you need to expand it using the .withColumn
function, or load from file directly into a Dataframe. For example, from parquet, or using the databricks csv parser.
如何使用.withColumn
import scala.util.Random
val numCols = 100
val numRows = 5
val delimiter = "\t"
def generateRowData = (0 until numCols).map(i => Random.alphanumeric.take(5).mkString).mkString(delimiter)
val df = sc.parallelize((0 until numRows).map(i => generateRowData).toList).toDF("data")
def extractCol(i: Int, sep: String) = udf[String, String](_.split(sep)(i))
val result = (0 until numCols).foldLeft(df){case (acc,i) => acc.withColumn(s"c$i", extractCol(i,delimiter)($"data"))}.drop($"data")
result.printSchema
result.show
这篇关于转换为数据框错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!