SBT:如何将类的实例打包为JAR? [英] SBT: How to package an instance of a class as a JAR?

查看:295
本文介绍了SBT:如何将类的实例打包为JAR?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码基本上是这样的:

I have code which essentially looks like this:

class FoodTrainer(images: S3Path) { // data is >100GB file living in S3
  def train(): FoodClassifier       // Very expensive - takes ~5 hours!
}

class FoodClassifier {          // Light-weight API class
  def isHotDog(input: Image): Boolean
}

我想在JAR-assembly( sbt assembly )时间,调用 val classifier = new FoodTrainer(s3Dir).train()并发布具有分类器实例的JAR,该实例可立即供下游库用户使用。

I want to at JAR-assembly (sbt assembly) time, invoke val classifier = new FoodTrainer(s3Dir).train() and publish the JAR which has the classifier instance instantly available to downstream library users.

最简单的方法是什么?对此有哪些既定的范例?我知道它在ML项目中是一个相当普遍的习惯用法,可以发布训练有素的模型 http://nlp.stanford.edu/software/stanford-corenlp-models -current.jar

What is the easiest way to do this? What are some established paradigms for this? I know its a fairly common idiom in ML projects to publish trained models e.g. http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar

如何使用 sbt assembly 我在哪里这样做不必将大型模型类或数据文件签入我的版本控制中?

How do I do this using sbt assembly where I do not have to check in a large model class or data file into my version control?

推荐答案

好的,我设法这样做:


  1. 将食品培训师模块分成2个独立的SBT子模块:食品 - 培训师食品模型。前者仅在编译时调用以创建模型并序列化为后者的生成资源。后者用作简单的工厂对象,用于从序列化版本实例化模型。每个下游项目仅依赖于此食品模型子模块。

  1. Separate the food-trainer module into 2 separate SBT sub-modules: food-trainer and food-model. The former is only invoked at compile time to create the model and serialize into the generated resources of the latter. The latter serves as a simple factory object to instantiate a model from the serialized version. Every downstream project only depends on this food-model submodule.

food-trainer 拥有大部分代码,并且有一个main方法可以序列化 FoodModel

The food-trainer has the bulk of all the code and has a main method that can serialize the FoodModel:

object FoodTrainer {
  def main(args Array[String]): Unit = {
    val input = args(0)
    val outputDir = args(1)
    val model: FoodModel = new FoodTrainer(input).train() 
    val out = new ObjectOutputStream(new File(outputDir + "/model.bin"))
    out.writeObject(model)
  }
}


  • 添加编译时任务以在 build.sbt 中生成食物训练模块:

  • Add a compile-time task to generate the food trainer module in your build.sbt:

    lazy val foodTrainer = (project in file("food-trainer"))
    
    lazy val foodModel = (project in file("food-model"))
      .dependsOn(foodTrainer)
      .settings(    
         resourceGenerators in Compile += Def.task {
           val log = streams.value.log
           val dest = (resourceManaged in Compile).value   
           IO.createDirectory(dest)
           runModuleMain(
             cmd = s"com.foo.bar.FoodTrainer $pathToImages ${dest.getAbsolutePath}",
             cp = (fullClasspath in Runtime in foodTrainer).value.files,
             log = log
           )             
          Seq(dest / "model.bin")
        }
    
    def runModuleMain(cmd: String, cp: Seq[File], log: Logger): Unit = {
      log.info(s"Running $cmd")
      val opt = ForkOptions(bootJars = cp, outputStrategy = Some(LoggedOutput(log)))
      val res = Fork.scala(config = opt, arguments = cmd.split(' '))
      require(res == 0, s"$cmd exited with code $res")
    }
    


  • 现在在你的食品模型模块中,你有这样的东西:

  • Now in your food-model module, you have something like this:

    object FoodModel {
      lazy val model: FoodModel =
        new ObjectInputStream(getClass.getResourceAsStream("/model.bin").readObject().asInstanceOf[FoodModel])
    }
    


  • 现在每个下游项目仅依赖 food-model ,只需使用 FoodModel.model 。我们得到以下好处:

    Every downstream project now only depends on food-model and simply uses FoodModel.model. We get the benefit of:


    1. 这是在运行时从JAR的
      打包资源中快速静态加载的

    2. 无需在运行时训练模型(非常贵b $ b)

    3. 无需在您的版本
      控件中签入模型(再次二进制模型非常大) - 它只打包到你的
      JAR

    4. 无需分开 FoodTrainer FoodModel
      打包到他们自己的JAR中(我们现在头痛地在内部部署它们) - 相反,我们只是将它们保存在同一个
      项目中但不同子模块被打包到一个JAR中。

    1. This being statically loaded fast at runtime from the JAR's packaged resources
    2. No need to train the model at runtime (very expensive)
    3. No need to checking-in the model in your version control (again the binary model is very big) - it is only packaged into your JAR
    4. No need to separate the FoodTrainer and FoodModel packages into their own JARs (we have the headache of deploying them internally now) - instead we simply keep them in the same project but different sub-modules which gets packed into a single JAR.

    这篇关于SBT:如何将类的实例打包为JAR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆