从单个字符串创建Spark DataFrame [英] Creating a Spark DataFrame from a single string

查看:248
本文介绍了从单个字符串创建Spark DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试采用硬编码的String并将其转换为1行Spark DataFrame(具有类型为StringType的单列),以便:

I'm trying to take a hardcoded String and turn it into a 1-row Spark DataFrame (with a single column of type StringType) such that:

String fizz = "buzz"

将使用Dataframe的.show()方法如下所示的结果:

Would result with a DataFrame whose .show() method looks like:

+-----+
| fizz|
+-----+
| buzz|
+-----+

到目前为止,我最大的尝试是:

My best attempt thus far has been:

val rawData = List("fizz")
val df = sqlContext.sparkContext.parallelize(Seq(rawData)).toDF()

df.show()

但是我收到以下编译器错误:

But I get the following compiler error:

java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:413)
    at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:155)

关于我要去哪里的任何想法?另外,如何将"buzz"设置为fizz列的行值?

Any ideas as to where I'm going awry? Also, how do I set "buzz" as the row value for the fizz column?

尝试:

sqlContext.sparkContext.parallelize(rawData).toDF()

我得到一个看起来像这样的DF:

I get a DF that looks like:

+----+
|  _1|
+----+
|buzz|
+----+

推荐答案

尝试:

sqlContext.sparkContext.parallelize(rawData).toDF()

在2.0中,您可以:

import spark.implicits._

rawData.toDF

(可选)为toDF提供一系列名称:

Optionally provide a sequence of names for toDF:

sqlContext.sparkContext.parallelize(rawData).toDF("fizz")

这篇关于从单个字符串创建Spark DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆