使用curl在svy上提交spark Jobs [英] Submitting spark Jobs over livy using curl

查看:136
本文介绍了使用curl在svy上提交spark Jobs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过Curl提交一次Livy(0.6.0)会话中的Spark作业

I'm submitting spark jobs on a livy (0.6.0) session through Curl

作业是一个很大的jar文件,完全扩展了Job接口,如下所示: https://stackoverflow.com/a/49220879/8557851

The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851

实际上,使用以下curl命令运行此代码时:

Actually when running this code using this curl command :

curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/

在代码方面,它完全类似于上面显示的答案:

When it comes to the code it is exactly like the answer shown above :

package com.mycompany.test
import org.apache.livy.{Job, JobContext}
import org.apache.spark._
import org.apache.livy.scalaapi._

object Test extends Job[Boolean]{
  override def call(jc: JobContext): Boolean = {
  val sc = jc.sc
  sc.getConf.getAll.foreach(println)
  return true
}

对于错误,它是一个Java Nullpointer异常,如下所示

As for the error it is a java Nullpointer exception as shown below

Exception in thread "main" java.lang.NullPointerException
    at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
    at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
    at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
    at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

因为输出例外是开始在jar中运行作业

as the output excepted is to start running the job in the jar

推荐答案

我使用了livy REST api,并且有2种方法来提交spark作业.请参考rest api 文档,您将获得对livy的合理理解休息请求.:

1.批次(/批次):
您提交请求,您将获得工作ID.根据作业ID,您可以查询Spark作业的状态.在这里,您可以选择执行uber jar和代码文件,但我从未使用过

I've used livy REST apis and with respect to that there are 2 approaches to submit spark job. Please refer rest api docs, you will get fair understanding livy rest requests.:

1. Batch (/batches) :
You submit request, you get job id. Based on job id you poll for status of spark job. Here you have option to execute uber jar as well as code file but I've never used latter

2.会话 (/会话/sessions/{sessionId}/声明):
您将spark作业提交为代码,无需创建uber jar.在这里,您首先创建一个会话,然后在此会话中执行 Statement/s (实际代码)

2. Session (/sessions and /sessions/{sessionId}/statements):
You submit spark job as code, no need to create uber jar. Here, you first create a Session and in this session you execute Statement/s (actual code)

对于这两种方法,如果您查看文档,都会对相应的休息请求和请求正文/参数有很好的解释.

For both the approaches, if you check documentation it has nice explanation about corresponding rest requests and request body/parameters.

示例/示例为这里

与您的代码更正的是:

Correction to your code, would be:

批处理

curl \
  -X POST \
  -d '{
    "kind": "spark",
    "files": [
      "<use-absolute-path>"
    ],
    "file": "absolute-path-to-your-application-jar",
    "className": "fully-qualified-spark-class-name",
    "driverMemory": "512M",
    "executorMemory": "512M",
    "conf": {<any-other-configs-as-key-val>}
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/batches/

会话和声明

// Create a session
curl \
  -X POST \
  -d '{
    "kind": "spark",
    "files": [
      "<use-absolute-path>"
    ],
    "driverMemory": "512M",
    "executorMemory": "512M",
    "conf": {<any-other-configs-as-key-val>}
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/sessions/

// Run code/statement in session created above
curl \
  -X POST \
  -d '{
    "kind": "spark",
    "code": "spark-code"
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/sessions/{sessionId}/statements

这篇关于使用curl在svy上提交spark Jobs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆