使用curl在svy上提交spark Jobs [英] Submitting spark Jobs over livy using curl
问题描述
我正在通过Curl提交一次Livy(0.6.0)会话中的Spark作业
I'm submitting spark jobs on a livy (0.6.0) session through Curl
作业是一个很大的jar文件,完全扩展了Job接口,如下所示: https://stackoverflow.com/a/49220879/8557851
The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851
实际上,使用以下curl命令运行此代码时:
Actually when running this code using this curl command :
curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/
在代码方面,它完全类似于上面显示的答案:
When it comes to the code it is exactly like the answer shown above :
package com.mycompany.test
import org.apache.livy.{Job, JobContext}
import org.apache.spark._
import org.apache.livy.scalaapi._
object Test extends Job[Boolean]{
override def call(jc: JobContext): Boolean = {
val sc = jc.sc
sc.getConf.getAll.foreach(println)
return true
}
对于错误,它是一个Java Nullpointer异常,如下所示
As for the error it is a java Nullpointer exception as shown below
Exception in thread "main" java.lang.NullPointerException
at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
因为输出例外是开始在jar中运行作业
as the output excepted is to start running the job in the jar
推荐答案
我使用了livy REST api,并且有2种方法来提交spark作业.请参考rest api 文档,您将获得对livy的合理理解休息请求.:
1.批次(/批次):
您提交请求,您将获得工作ID.根据作业ID,您可以查询Spark作业的状态.在这里,您可以选择执行uber jar和代码文件,但我从未使用过
I've used livy REST apis and with respect to that there are 2 approaches to submit spark job. Please refer rest api docs, you will get fair understanding livy rest requests.:
1. Batch (/batches) :
You submit request, you get job id. Based on job id you poll for status of spark job. Here you have option to execute uber jar as well as code file but I've never used latter
2.会话 (/会话和/sessions/{sessionId}/声明):
您将spark作业提交为代码,无需创建uber jar.在这里,您首先创建一个会话,然后在此会话中执行 Statement/s (实际代码)
2. Session (/sessions and /sessions/{sessionId}/statements):
You submit spark job as code, no need to create uber jar. Here, you first create a Session and in this session you execute Statement/s (actual code)
对于这两种方法,如果您查看文档,都会对相应的休息请求和请求正文/参数有很好的解释.
For both the approaches, if you check documentation it has nice explanation about corresponding rest requests and request body/parameters.
示例/示例为这里
与您的代码更正的是:
Correction to your code, would be:
批处理
curl \
-X POST \
-d '{
"kind": "spark",
"files": [
"<use-absolute-path>"
],
"file": "absolute-path-to-your-application-jar",
"className": "fully-qualified-spark-class-name",
"driverMemory": "512M",
"executorMemory": "512M",
"conf": {<any-other-configs-as-key-val>}
}' \
-H "Content-Type: application/json" \
localhost:8998/batches/
会话和声明
// Create a session
curl \
-X POST \
-d '{
"kind": "spark",
"files": [
"<use-absolute-path>"
],
"driverMemory": "512M",
"executorMemory": "512M",
"conf": {<any-other-configs-as-key-val>}
}' \
-H "Content-Type: application/json" \
localhost:8998/sessions/
// Run code/statement in session created above
curl \
-X POST \
-d '{
"kind": "spark",
"code": "spark-code"
}' \
-H "Content-Type: application/json" \
localhost:8998/sessions/{sessionId}/statements
这篇关于使用curl在svy上提交spark Jobs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!