SBT：如何将类的实例打包为JAR？ [英] SBT: How to package an instance of a class as a JAR?

查看：295 发布时间：2018/11/19 13:57:26 java scala jar sbt sbt-assembly

本文介绍了SBT：如何将类的实例打包为JAR？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的代码基本上是这样的：

I have code which essentially looks like this:

class FoodTrainer(images: S3Path) { // data is >100GB file living in S3
  def train(): FoodClassifier       // Very expensive - takes ~5 hours!
}

class FoodClassifier {          // Light-weight API class
  def isHotDog(input: Image): Boolean
}

我想在JAR-assembly（ sbt assembly ）时间，调用 val classifier = new FoodTrainer（s3Dir）.train（）并发布具有分类器实例的JAR，该实例可立即供下游库用户使用。

I want to at JAR-assembly (sbt assembly) time, invoke val classifier = new FoodTrainer(s3Dir).train() and publish the JAR which has the classifier instance instantly available to downstream library users.

最简单的方法是什么？对此有哪些既定的范例？我知道它在ML项目中是一个相当普遍的习惯用法，可以发布训练有素的模型 http://nlp.stanford.edu/software/stanford-corenlp-models -current.jar

What is the easiest way to do this? What are some established paradigms for this? I know its a fairly common idiom in ML projects to publish trained models e.g. http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar

如何使用 sbt assembly 我在哪里这样做不必将大型模型类或数据文件签入我的版本控制中？

How do I do this using sbt assembly where I do not have to check in a large model class or data file into my version control?

推荐答案

好的，我设法这样做：

将食品培训师模块分成2个独立的SBT子模块：食品 - 培训师和食品模型。前者仅在编译时调用以创建模型并序列化为后者的生成资源。后者用作简单的工厂对象，用于从序列化版本实例化模型。每个下游项目仅依赖于此食品模型子模块。

Separate the food-trainer module into 2 separate SBT sub-modules: food-trainer and food-model. The former is only invoked at compile time to create the model and serialize into the generated resources of the latter. The latter serves as a simple factory object to instantiate a model from the serialized version. Every downstream project only depends on this food-model submodule.

food-trainer 拥有大部分代码，并且有一个main方法可以序列化 FoodModel ：

The food-trainer has the bulk of all the code and has a main method that can serialize the FoodModel:

object FoodTrainer {
  def main(args Array[String]): Unit = {
    val input = args(0)
    val outputDir = args(1)
    val model: FoodModel = new FoodTrainer(input).train() 
    val out = new ObjectOutputStream(new File(outputDir + "/model.bin"))
    out.writeObject(model)
  }
}

添加编译时任务以在 build.sbt 中生成食物训练模块：

Add a compile-time task to generate the food trainer module in your build.sbt:

lazy val foodTrainer = (project in file("food-trainer"))

lazy val foodModel = (project in file("food-model"))
  .dependsOn(foodTrainer)
  .settings(    
     resourceGenerators in Compile += Def.task {
       val log = streams.value.log
       val dest = (resourceManaged in Compile).value   
       IO.createDirectory(dest)
       runModuleMain(
         cmd = s"com.foo.bar.FoodTrainer $pathToImages ${dest.getAbsolutePath}",
         cp = (fullClasspath in Runtime in foodTrainer).value.files,
         log = log
       )             
      Seq(dest / "model.bin")
    }

def runModuleMain(cmd: String, cp: Seq[File], log: Logger): Unit = {
  log.info(s"Running $cmd")
  val opt = ForkOptions(bootJars = cp, outputStrategy = Some(LoggedOutput(log)))
  val res = Fork.scala(config = opt, arguments = cmd.split(' '))
  require(res == 0, s"$cmd exited with code $res")
}

现在在你的食品模型模块中，你有这样的东西：

Now in your food-model module, you have something like this:

object FoodModel {
  lazy val model: FoodModel =
    new ObjectInputStream(getClass.getResourceAsStream("/model.bin").readObject().asInstanceOf[FoodModel])
}

现在每个下游项目仅依赖 food-model ，只需使用 FoodModel.model 。我们得到以下好处：

Every downstream project now only depends on food-model and simply uses FoodModel.model. We get the benefit of:

这是在运行时从JAR的
打包资源中快速静态加载的

无需在运行时训练模型（非常贵b $ b）

无需在您的版本
控件中签入模型（再次二进制模型非常大） - 它只打包到你的
JAR

无需分开 FoodTrainer 和 FoodModel
打包到他们自己的JAR中（我们现在头痛地在内部部署它们） - 相反，我们只是将它们保存在同一个
项目中但不同子模块被打包到一个JAR中。

This being statically loaded fast at runtime from the JAR's packaged resources
No need to train the model at runtime (very expensive)
No need to checking-in the model in your version control (again the binary model is very big) - it is only packaged into your JAR
No need to separate the FoodTrainer and FoodModel packages into their own JARs (we have the headache of deploying them internally now) - instead we simply keep them in the same project but different sub-modules which gets packed into a single JAR.

这篇关于SBT：如何将类的实例打包为JAR？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SBT：如何将类的实例打包为JAR？ [英] SBT: How to package an instance of a class as a JAR?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

SBT：如何将类的实例打包为JAR？ [英] SBT: How to package an instance of a class as a JAR?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭