作为Java Web应用程序运行spark [英] run spark as java web application

查看:352
本文介绍了作为Java Web应用程序运行spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了Spark ML,并能够针对我的业务问题进行合理的预测

I have used Spark ML and was able to get reasonable accuracy in prediction for my business problem

数据不是很大,我能够使用stanford NLP转换输入(基本上是一个csv文件),并在本地计算机上运行Naive Bayes进行预测.

The data is not huge and I was able to transform the input ( basically a csv file ) using stanford NLP and run Naive Bayes for prediction in my local machine.

我想像一个简单的Java主程序或一个简单的MVC Web应用程序一样运行此预测服务

I want to run this prediction service like a simple java main program or along with a simple MVC web application

当前我使用spark-submit命令运行预测吗?相反,我可以从我的servlet/控制器类创建spark上下文和数据框架吗?

Currently I run my prediction using the spark-submit command ? Instead , can I create spark context and data frames from my servlet / controller class ?

我找不到有关这种情况的任何文档.

I could not find any documentation on such scenarios.

请就上述可行性提供建议

Kindly advise regarding the feasibility of the above

推荐答案

Spark具有REST api,可通过调用spark主主机名来提交作业.

Spark has REST apis to submit jobs by invoking spark master hostname.

提交申请:

curl -X POST http://spark-cluster-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
  "action" : "CreateSubmissionRequest",
  "appArgs" : [ "myAppArgument1" ],
  "appResource" : "file:/myfilepath/spark-job-1.0.jar",
  "clientSparkVersion" : "1.5.0",
  "environmentVariables" : {
    "SPARK_ENV_LOADED" : "1"
  },
  "mainClass" : "com.mycompany.MyJob",
  "sparkProperties" : {
    "spark.jars" : "file:/myfilepath/spark-job-1.0.jar",
    "spark.driver.supervise" : "false",
    "spark.app.name" : "MyJob",
    "spark.eventLog.enabled": "true",
    "spark.submit.deployMode" : "cluster",
    "spark.master" : "spark://spark-cluster-ip:6066"
  }
}'

提交回复:

{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20151008145126-0000",
  "serverSparkVersion" : "1.5.0",
  "submissionId" : "driver-20151008145126-0000",
  "success" : true
}

获取已提交申请的状态

curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000

状态响应

{
  "action" : "SubmissionStatusResponse",
  "driverState" : "FINISHED",
  "serverSparkVersion" : "1.5.0",
  "submissionId" : "driver-20151008145126-0000",
  "success" : true,
  "workerHostPort" : "192.168.3.153:46894",
  "workerId" : "worker-20151007093409-192.168.3.153-46894"
}

现在,您提交的spark应用程序应执行所有操作,而save output to any datasource and access the data via thrift server则没有太多数据要传输(如果要在MVC应用程序数据库和Hadoop集群之间传输数据,可以考虑使用sqoop) .

Now in the spark application which you submit should perform all the operations and save output to any datasource and access the data via thrift server as don't have much data to transfer(you can think of sqoop if you want to transfer data between your MVC app db and Hadoop cluster).

信用: link1 (根据评论中的问题) 使用必要的依赖关系构建spark应用程序jar,并在本地模式下运行作业.编写jar来读取CSV并使用MLib,然后将预测输出存储在某些数据源中以从Web应用程序访问它.

(as per question in comment) build spark application jar with necessary dependencies and run the job in local mode. Write the jar in way to read the CSV and make use of MLib then store the prediction output in some data source to access it from web app.

这篇关于作为Java Web应用程序运行spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆