如何使用单个 Spark 上下文在 Apache Spark 中运行并发作业(操作) [英] How to run concurrent jobs(actions) in Apache Spark using single spark context

查看:20
本文介绍了如何使用单个 Spark 上下文在 Apache Spark 中运行并发作业(操作)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


它在 Apache Spark 文档中说在每个 Spark 应用程序中,如果多个作业"(Spark 操作)由不同的线程提交,则它们可能同时运行".有人可以解释如何为以下示例代码实现这种并发性吗?


It says in Apache Spark documentation "within each Spark application, multiple "jobs" (Spark actions) may be running concurrently if they were submitted by different threads". Can someone explain how to achieve this concurrency for the following sample code?

    SparkConf conf = new SparkConf().setAppName("Simple_App");
    JavaSparkContext sc = new JavaSparkContext(conf);

    JavaRDD<String> file1 = sc.textFile("/path/to/test_doc1");
    JavaRDD<String> file2 = sc.textFile("/path/to/test_doc2");

    System.out.println(file1.count());
    System.out.println(file2.count());

这两个作业是独立的,必须同时运行.
谢谢.

These two jobs are independent and must run concurrently.
Thank You.

推荐答案

试试这个:

    final JavaSparkContext sc = new JavaSparkContext("local[2]","Simple_App");
    ExecutorService executorService = Executors.newFixedThreadPool(2);
    // Start thread 1
    Future<Long> future1 = executorService.submit(new Callable<Long>() {
        @Override
        public Long call() throws Exception {
            JavaRDD<String> file1 = sc.textFile("/path/to/test_doc1");
            return file1.count();
        }
    });
    // Start thread 2
    Future<Long> future2 = executorService.submit(new Callable<Long>() {
        @Override
        public Long call() throws Exception {
            JavaRDD<String> file2 = sc.textFile("/path/to/test_doc2");
            return file2.count();
        }
    });
    // Wait thread 1
    System.out.println("File1:"+future1.get());
    // Wait thread 2
    System.out.println("File2:"+future2.get());

这篇关于如何使用单个 Spark 上下文在 Apache Spark 中运行并发作业(操作)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆