为pyspark设置SparkContext [英] setting SparkContext for pyspark

查看:569
本文介绍了为pyspark设置SparkContext的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是sparkpyspark的新手.如果有人解释SparkContext参数的确切作用,我将不胜感激.以及如何为Python应用程序设置spark_context?

I am newbie with spark and pyspark. I will appreciate if somebody explain what exactly does SparkContext parameter do? And how could I set spark_context for python application?

推荐答案

请参见此处:spark_context表示您与正在运行的Spark集群管理器的接口.换句话说,您将已经为spark定义了一个或多个运行环境(请参阅安装/初始化文档),详细说明了要在其上运行的节点等.您可以通过一个配置启动spark_context对象,该配置告诉它要使用的环境,并且例如,应用程序名称.所有进一步的交互(例如加载数据)都作为上下文对象的方法发生.

See here: the spark_context represents your interface to a running spark cluster manager. In other words, you will have already defined one or more running environments for spark (see the installation/initialization docs), detailing the nodes to run on etc. You start a spark_context object with a configuration which tells it which environment to use and, for example, the application name. All further interaction, such as loading data, happen as methods of the context object.

对于简单的示例和测试,您可以本地"运行spark集群,并跳过上述内容的许多细节,例如

For the simple examples and testing, you can run the spark cluster "locally", and skip much of the detail of what is above, e.g.,

./bin/pyspark --master local[4]

将使用已设置为在您自己的CPU上使用四个线程的上下文启动解释器.

will start an interpreter with a context already set to use four threads on your own CPU.

在独立应用程序中,可通过sparksubmit运行:

In a standalone app, to be run with sparksubmit:

from pyspark import SparkContext
sc = SparkContext("local", "Simple App")

这篇关于为pyspark设置SparkContext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆