激发远程开发环境 [英] spark remote develop environment

查看:80
本文介绍了激发远程开发环境的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想远程激发开发环境.

i want to remote spark develop environment.

机器是我的开发机器,java,eclipse,Windows 10.

A machine is my develop machine, java, eclipse, windows 10.

我也有另一台机器已经安装了cloduera(纱线上的火花).

also i have another machine already installed cloduera(spark on yarn).

我尝试过

    String appName = "test" + new Date(System.currentTimeMillis());
    String master = "spark://*:6066";
    String host = "*";
    String jar = "C:\\Users\\default.DESKTOP-0BP338U\\Desktop\\workspace\\workspace_study\\spark-start-on-yarn\\target\\spark-start-on-yarn-0.0.1-SNAPSHOT.jar";

    SparkConf conf = new SparkConf().setAppName(appName).setMaster(master)
            .set("spark.driver.host",  host)
            .setJars(new String[]{jar});
    JavaSparkContext sc = new JavaSparkContext(conf);

但是连接被拒绝.

我如何在A机上开发和测试spark程序?

how can i develop and test the spark program on my A machine?

我添加了环境变量

这是我的代码

    SparkConf conf = new SparkConf()
            .setAppName(new Date(System.currentTimeMillis()).toString())
            .setMaster("yarn");
    JavaSparkContext sc = new JavaSparkContext(conf);


    List<Integer> data = Arrays.asList(1,2,3,4,1,2,3,4,5,1,4,1,1,1,4,2,2,4,1,1,3,4,2,3);
    JavaRDD<Integer> distData = sc.parallelize(data);

    JavaPairRDD<Integer, Integer> pairs = distData.mapToPair(s -> new Tuple2<Integer, Integer>(s, 1));
    JavaPairRDD<Integer, Integer> counts = pairs.reduceByKey((a, b) -> a + b);

    System.out.println("================= " + counts);

    sc.close();
    sc.stop();

,错误是"SparkException:云无法解析主URL:'yarn'"

and error is "SparkException : Cloud not parse Master URL: 'yarn'"

我错过了什么?请帮助我...

what things i missed? please help me...

推荐答案

您需要

  1. 下载Hadoop集群的 HADOOP_CONF_DIR 配置文件.

您的计算机中设置 HADOOP_CONF_DIR 环境变量.或者,如果这不起作用,则可以将XML文件放在 src/main/resources 文件夹中,以将它们包括在类路径中.

Set HADOOP_CONF_DIR envrionment variable in your machine. Or, if that doesn't work, then you can place the XML files in your src/main/resources folder to include them on the classpath.

使用 setMaster("yarn-client")

确保 HADOOP_CONF_DIR YARN_CONF_DIR 指向包含Hadoop集群(客户端)配置文件的目录.这些配置用于写入HDFS并连接到YARN ResourceManager)

Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager)

在YARN上发火花

从外部计算机运行Spark

  1. 使用您的本地用户名创建HDFS /user 文件夹.HDFS权限需要此权限.

  1. Make an HDFS /user folder with your local username. This is needed for HDFS permissions.

开发并最好使用Maven/Gradle来管理Java库.您还需要为各自的Hadoop版本使用 Cloudera Maven存储库

Develop, and preferably use Maven/Gradle to manage your Java libraries. You also need to use the Cloudera Maven repository for you respective Hadoop versions

您也不需要 setJars().您的应用应自行连接并运行.

You don't need setJars() either. Your app should connect and run on it's own.

这篇关于激发远程开发环境的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆