如何将基于案例类的RDD转换为DataFrame? [英] How to convert a case-class-based RDD into a DataFrame?
问题描述
火花文档显示了如何使用Scala案例类来推断模式,从而从RDD创建DataFrame.我正在尝试使用sqlContext.createDataFrame(RDD, CaseClass)
重现此概念,但是我的DataFrame最终为空.这是我的Scala代码:
The Spark documentation shows how to create a DataFrame from an RDD, using Scala case classes to infer a schema. I am trying to reproduce this concept using sqlContext.createDataFrame(RDD, CaseClass)
, but my DataFrame ends up empty. Here's my Scala code:
// sc is the SparkContext, while sqlContext is the SQLContext.
// Define the case class and raw data
case class Dog(name: String)
val data = Array(
Dog("Rex"),
Dog("Fido")
)
// Create an RDD from the raw data
val dogRDD = sc.parallelize(data)
// Print the RDD for debugging (this works, shows 2 dogs)
dogRDD.collect().foreach(println)
// Create a DataFrame from the RDD
val dogDF = sqlContext.createDataFrame(dogRDD, classOf[Dog])
// Print the DataFrame for debugging (this fails, shows 0 dogs)
dogDF.show()
我看到的输出是:
Dog(Rex)
Dog(Fido)
++
||
++
||
||
++
我想念什么?
谢谢!
推荐答案
您只需要
val dogDF = sqlContext.createDataFrame(dogRDD)
第二个参数是Java API的一部分,希望您的类遵循Java Bean约定(getters/setters).您的case类不遵循此约定,因此未检测到任何属性,这导致没有列的空DataFrame.
Second parameter is part of Java API and expects you class follows java beans convention (getters/setters). Your case class doesn't follow this convention, so no property is detected, that leads to empty DataFrame with no columns.
这篇关于如何将基于案例类的RDD转换为DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!