pyspark:NameError:未定义名称"spark" [英] pyspark : NameError: name 'spark' is not defined

查看:744
本文介绍了pyspark:NameError:未定义名称"spark"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从官方文档网站复制pyspark.ml示例: http://spark.apache.org /docs/latest/api/python/pyspark.ml.html#pyspark.ml.Transformer

I am copying the pyspark.ml example from the official document website: http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.Transformer

data = [(Vectors.dense([0.0, 0.0]),), (Vectors.dense([1.0, 1.0]),),(Vectors.dense([9.0, 8.0]),), (Vectors.dense([8.0, 9.0]),)]
df = spark.createDataFrame(data, ["features"])
kmeans = KMeans(k=2, seed=1)
model = kmeans.fit(df)

但是,上面的示例无法运行,并给我以下错误:

However, the example above wouldn't run and gave me the following errors:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-28-aaffcd1239c9> in <module>()
      1 from pyspark import *
      2 data = [(Vectors.dense([0.0, 0.0]),), (Vectors.dense([1.0, 1.0]),),(Vectors.dense([9.0, 8.0]),), (Vectors.dense([8.0, 9.0]),)]
----> 3 df = spark.createDataFrame(data, ["features"])
      4 kmeans = KMeans(k=2, seed=1)
      5 model = kmeans.fit(df)

NameError: name 'spark' is not defined

要使示例运行,还需要设置哪些其他配置/变量?

What additional configuration/variable needs to be set to get the example running?

推荐答案

由于您正在调用

Since you are calling createDataFrame(), you need to do this:

df = sqlContext.createDataFrame(data, ["features"])

代替此:

df = spark.createDataFrame(data, ["features"])

spark代表sqlContext.

通常,有些人将其作为sc,因此,如果这不起作用,则可以尝试:

In general, some people have that as sc, so if that didn't work, you could try:

df = sc.createDataFrame(data, ["features"])

这篇关于pyspark:NameError:未定义名称"spark"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆