激发远程开发环境 [英] spark remote develop environment
问题描述
我想远程激发开发环境.
i want to remote spark develop environment.
机器是我的开发机器,java,eclipse,Windows 10.
A machine is my develop machine, java, eclipse, windows 10.
我也有另一台机器已经安装了cloduera(纱线上的火花).
also i have another machine already installed cloduera(spark on yarn).
我尝试过
String appName = "test" + new Date(System.currentTimeMillis());
String master = "spark://*:6066";
String host = "*";
String jar = "C:\\Users\\default.DESKTOP-0BP338U\\Desktop\\workspace\\workspace_study\\spark-start-on-yarn\\target\\spark-start-on-yarn-0.0.1-SNAPSHOT.jar";
SparkConf conf = new SparkConf().setAppName(appName).setMaster(master)
.set("spark.driver.host", host)
.setJars(new String[]{jar});
JavaSparkContext sc = new JavaSparkContext(conf);
但是连接被拒绝.
我如何在A机上开发和测试spark程序?
how can i develop and test the spark program on my A machine?
我添加了环境变量
这是我的代码
SparkConf conf = new SparkConf()
.setAppName(new Date(System.currentTimeMillis()).toString())
.setMaster("yarn");
JavaSparkContext sc = new JavaSparkContext(conf);
List<Integer> data = Arrays.asList(1,2,3,4,1,2,3,4,5,1,4,1,1,1,4,2,2,4,1,1,3,4,2,3);
JavaRDD<Integer> distData = sc.parallelize(data);
JavaPairRDD<Integer, Integer> pairs = distData.mapToPair(s -> new Tuple2<Integer, Integer>(s, 1));
JavaPairRDD<Integer, Integer> counts = pairs.reduceByKey((a, b) -> a + b);
System.out.println("================= " + counts);
sc.close();
sc.stop();
,错误是"SparkException:云无法解析主URL:'yarn'"
and error is "SparkException : Cloud not parse Master URL: 'yarn'"
我错过了什么?请帮助我...
what things i missed? please help me...
推荐答案
您需要
-
下载Hadoop集群的
HADOOP_CONF_DIR
配置文件.
在您的计算机中设置 HADOOP_CONF_DIR
环境变量.或者,如果这不起作用,则可以将XML文件放在 src/main/resources
文件夹中,以将它们包括在类路径中.
Set HADOOP_CONF_DIR
envrionment variable in your machine. Or, if that doesn't work, then you can place the XML files in your src/main/resources
folder to include them on the classpath.
使用 setMaster("yarn-client")
确保
HADOOP_CONF_DIR
或YARN_CONF_DIR
指向包含Hadoop集群(客户端)配置文件的目录.这些配置用于写入HDFS并连接到YARN ResourceManager)
Ensure that
HADOOP_CONF_DIR
orYARN_CONF_DIR
points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager)
-
使用您的本地用户名创建HDFS
/user
文件夹.HDFS权限需要此权限.
Make an HDFS
/user
folder with your local username. This is needed for HDFS permissions.
开发并最好使用Maven/Gradle来管理Java库.您还需要为各自的Hadoop版本使用 Cloudera Maven存储库
Develop, and preferably use Maven/Gradle to manage your Java libraries. You also need to use the Cloudera Maven repository for you respective Hadoop versions
您也不需要 setJars()
.您的应用应自行连接并运行.
You don't need setJars()
either. Your app should connect and run on it's own.
这篇关于激发远程开发环境的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!