pyspark 程序抛出名称“spark"未定义 [英] pyspark program throwing name 'spark' is not defined

查看:31
本文介绍了pyspark 程序抛出名称“spark"未定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的程序抛出错误名称'spark'未定义

Below program throwing error name 'spark' is not defined

Traceback (most recent call last):  
File "pgm_latest.py", line 232, in <module>
    sconf =SparkConf().set(spark.dynamicAllocation.enabled,true)       
        .set(spark.dynamicAllocation.maxExecutors,300)        
        .set(spark.shuffle.service.enabled,true)       
        .set(spark.shuffle.spill.compress,true)
NameError: name 'spark' is not defined 
spark-submit --driver-memory 12g --master yarn-cluster --executor-memory 6g --executor-cores 3 pgm_latest.py

代码

#!/usr/bin/python
import sys
import os
from datetime import *
from time import *
from pyspark.sql import *
from pyspark
import SparkContext
from pyspark import SparkConf

sc = SparkContext()
sqlCtx= HiveContext(sc)

sqlCtx.sql('SET spark.sql.autoBroadcastJoinThreshold=104857600')
sqlCtx.sql('SET Tungsten=true')
sqlCtx.sql('SET spark.sql.shuffle.partitions=500')
sqlCtx.sql('SET spark.sql.inMemoryColumnarStorage.compressed=true')
sqlCtx.sql('SET spark.sql.inMemoryColumnarStorage.batchSize=12000')
sqlCtx.sql('SET spark.sql.parquet.cacheMetadata=true')
sqlCtx.sql('SET spark.sql.parquet.filterPushdown=true')
sqlCtx.sql('SET spark.sql.hive.convertMetastoreParquet=true')
sqlCtx.sql('SET spark.sql.parquet.binaryAsString=true')
sqlCtx.sql('SET spark.sql.parquet.compression.codec=snappy')
sqlCtx.sql('SET spark.sql.hive.convertMetastoreParquet=true')

## Main functionality
def main(sc):

    if name == 'main':

        # Configure OPTIONS
        sconf =SparkConf() \
            .set("spark.dynamicAllocation.enabled","true")\
            .set("spark.dynamicAllocation.maxExecutors",300)\
            .set("spark.shuffle.service.enabled","true")\
            .set("spark.shuffle.spill.compress","true")

sc =SparkContext(conf=sconf)

# Execute Main functionality

main(sc)
sc.stop()

推荐答案

我认为您使用的 Spark 版本比 2.x 旧.

I think you are using old spark version than 2.x.

代替这个

spark.createDataFrame(..)

在下面使用

> df = sqlContext.createDataFrame(...)

这篇关于pyspark 程序抛出名称“spark"未定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆