教程中出现多个SparkContexts错误 [英] multiple SparkContexts error in tutorial

查看:212
本文介绍了教程中出现多个SparkContexts错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行非常基本的Spark + Python pyspark教程-请参见 http://spark.apache.org/docs/0.9.0/quick-start.html

I am attempting to run the very basic Spark+Python pyspark tutorial -- see http://spark.apache.org/docs/0.9.0/quick-start.html

当我尝试初始化新的SparkContext时,

When I attempt to initialize a new SparkContext,

from pyspark import SparkContext
sc = SparkContext("local[4]", "test")

我收到以下错误:

ValueError: Cannot run multiple SparkContexts at once

我想知道我以前尝试运行示例代码的尝试是否将某些东西加载到了内存中而无法清除.是否有办法列出内存中已经存在的当前SparkContext和/或清除它们,以便示例代码可以运行?

I'm wondering if my previous attempts at running example code loaded something into memory that didn't clear out. Is there a way to list current SparkContexts already in memory and/or clear them out so the sample code will run?

推荐答案

原来,以交互方式运行./bin/pyspark会自动加载SPARKCONTEXT.这是我启动pyspark时看到的内容:

Turns out that running ./bin/pyspark interactively AUTOMATICALLY LOADS A SPARKCONTEXT. Here is what I see when I start pyspark:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 0.9.1
      /_/

Using Python version 2.6.6 (r266:84292, Feb 22 2013 00:00:18)
Spark context available as sc.

...因此您可以在开始时运行"del sc",也可以继续使用自动定义的"sc".

...so you can either run "del sc" at the beginning or else go ahead and use "sc" as automatically defined.

该示例的另一个问题是,它似乎在查看常规的NFS文件系统位置,而实际上却在尝试查看Hadoop的HDFS文件系统.在运行代码之前,我必须使用"hadoop fs -put README.md README.md"在$ SPARK_HOME位置上载README.md文件.

The other problem with the example is that it appears to look at a regular NFS filesystem location, whereas it really is trying to look at the HDFS filesystem for Hadoop. I had to upload the README.md file in the $SPARK_HOME location using "hadoop fs -put README.md README.md" before running the code.

这是我交互式运行的修改后的示例程序:

Here is the modified example program that I ran interactively:

from pyspark import SparkContext
logFile = "README.md"
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print "Lines with a: %i, lines with b: %i" % (numAs, numBs)

这是独立python文件的修改版本:

and here is the modified version of the stand-alone python file:

"""SimpleApp.py"""
from pyspark import SparkContext
logFile = "README.md"  # Should be some file on your system
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print "Lines with a: %i, lines with b: %i" % (numAs, numBs)

我现在可以使用$ SPARK_HOME/bin/pyspark SimpleApp.py来执行

which I can now execute using $SPARK_HOME/bin/pyspark SimpleApp.py

这篇关于教程中出现多个SparkContexts错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆