如何从Scala的Iterables列表创建DataFrame? [英] How to create DataFrame from Scala's List of Iterables?
本文介绍了如何从Scala的Iterables列表创建DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下Scala值:
I have the following Scala value:
val values: List[Iterable[Any]] = Traces().evaluate(features).toList
,我想将其转换为DataFrame.
and I want to convert it to a DataFrame.
当我尝试以下操作时:
sqlContext.createDataFrame(values)
我收到此错误:
error: overloaded method value createDataFrame with alternatives:
[A <: Product](data: Seq[A])(implicit evidence$2: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
[A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
cannot be applied to (List[Iterable[Any]])
sqlContext.createDataFrame(values)
为什么?
推荐答案
正如 zero323 所述,我们需要先进行转换List[Iterable[Any]]
到List[Row]
,然后将行放在RDD
中,并为spark数据框准备架构.
As zero323 mentioned, we need to first convert List[Iterable[Any]]
to List[Row]
and then put rows in RDD
and prepare schema for the spark data frame.
要将List[Iterable[Any]]
转换为List[Row]
,我们可以说
To convert List[Iterable[Any]]
to List[Row]
, we can say
val rows = values.map{x => Row(x:_*)}
然后使用类似schema
的架构,我们可以制作RDD
and then having schema like schema
, we can make RDD
val rdd = sparkContext.makeRDD[RDD](rows)
最后创建一个火花数据框
and finally create a spark data frame
val df = sqlContext.createDataFrame(rdd, schema)
这篇关于如何从Scala的Iterables列表创建DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文