将rdd转换为数据帧:AttributeError:'RDD'对象没有属性'toDF' [英] Converting rdd to dataframe: AttributeError: 'RDD' object has no attribute 'toDF'
本文介绍了将rdd转换为数据帧:AttributeError:'RDD'对象没有属性'toDF'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("myApp").setMaster("local")
sc = SparkContext(conf=conf)
a = sc.parallelize([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]]).toDF(['ind', "state"])
a.show()
结果:
Traceback (most recent call last):
File "/Users/ktemlyakov/messing_around/SparkStuff/mock_maersk_data.py", line 7, in <module>
a = sc.parallelize([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]]).toDF(['ind', "state"])
AttributeError: 'RDD' object has no attribute 'toDF'
我想念什么?
推荐答案
sqlContext
丢失;它需要被创建.以下代码有效:
sqlContext
is missing; it needs to be created. The following code works:
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
from pyspark import sql
conf = SparkConf().setAppName("myFirstApp").setMaster("local")
sc = SparkContext(conf=conf)
sqlContext = sql.SQLContext(sc)
a = sc.parallelize([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]]).toDF(['ind', "state"])
a.show()
在Spark 2.0中,可以通过以下方式实现上述目标:
In Spark 2.0, the above can be achieved with:
from pyspark import SparkConf
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").config(conf=SparkConf()).getOrCreate()
a = spark.createDataFrame([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]], ['ind', "state"])
a.show()
这篇关于将rdd转换为数据帧:AttributeError:'RDD'对象没有属性'toDF'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文