SBT:如何将类的实例打包为JAR? [英] SBT: How to package an instance of a class as a JAR?
问题描述
我的代码基本上是这样的:
I have code which essentially looks like this:
class FoodTrainer(images: S3Path) { // data is >100GB file living in S3
def train(): FoodClassifier // Very expensive - takes ~5 hours!
}
class FoodClassifier { // Light-weight API class
def isHotDog(input: Image): Boolean
}
我想在JAR-assembly( sbt assembly
)时间,调用 val classifier = new FoodTrainer(s3Dir).train()
并发布具有分类器
实例的JAR,该实例可立即供下游库用户使用。
I want to at JAR-assembly (sbt assembly
) time, invoke val classifier = new FoodTrainer(s3Dir).train()
and publish the JAR which has the classifier
instance instantly available to downstream library users.
最简单的方法是什么?对此有哪些既定的范例?我知道它在ML项目中是一个相当普遍的习惯用法,可以发布训练有素的模型 http://nlp.stanford.edu/software/stanford-corenlp-models -current.jar
What is the easiest way to do this? What are some established paradigms for this? I know its a fairly common idiom in ML projects to publish trained models e.g. http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar
如何使用 sbt assembly
我在哪里这样做不必将大型模型类或数据文件签入我的版本控制中?
How do I do this using sbt assembly
where I do not have to check in a large model class or data file into my version control?
推荐答案
好的,我设法这样做:
-
将食品培训师模块分成2个独立的SBT子模块:
食品 - 培训师
和食品模型
。前者仅在编译时调用以创建模型并序列化为后者的生成资源。后者用作简单的工厂对象,用于从序列化版本实例化模型。每个下游项目仅依赖于此食品模型
子模块。
Separate the food-trainer module into 2 separate SBT sub-modules:
food-trainer
andfood-model
. The former is only invoked at compile time to create the model and serialize into the generated resources of the latter. The latter serves as a simple factory object to instantiate a model from the serialized version. Every downstream project only depends on thisfood-model
submodule.
food-trainer
拥有大部分代码,并且有一个main方法可以序列化 FoodModel
:
The food-trainer
has the bulk of all the code and has a main method that can serialize the FoodModel
:
object FoodTrainer {
def main(args Array[String]): Unit = {
val input = args(0)
val outputDir = args(1)
val model: FoodModel = new FoodTrainer(input).train()
val out = new ObjectOutputStream(new File(outputDir + "/model.bin"))
out.writeObject(model)
}
}
添加编译时任务以在 build.sbt
中生成食物训练模块:
Add a compile-time task to generate the food trainer module in your build.sbt
:
lazy val foodTrainer = (project in file("food-trainer"))
lazy val foodModel = (project in file("food-model"))
.dependsOn(foodTrainer)
.settings(
resourceGenerators in Compile += Def.task {
val log = streams.value.log
val dest = (resourceManaged in Compile).value
IO.createDirectory(dest)
runModuleMain(
cmd = s"com.foo.bar.FoodTrainer $pathToImages ${dest.getAbsolutePath}",
cp = (fullClasspath in Runtime in foodTrainer).value.files,
log = log
)
Seq(dest / "model.bin")
}
def runModuleMain(cmd: String, cp: Seq[File], log: Logger): Unit = {
log.info(s"Running $cmd")
val opt = ForkOptions(bootJars = cp, outputStrategy = Some(LoggedOutput(log)))
val res = Fork.scala(config = opt, arguments = cmd.split(' '))
require(res == 0, s"$cmd exited with code $res")
}
现在在你的食品模型
模块中,你有这样的东西:
Now in your food-model
module, you have something like this:
object FoodModel {
lazy val model: FoodModel =
new ObjectInputStream(getClass.getResourceAsStream("/model.bin").readObject().asInstanceOf[FoodModel])
}
现在每个下游项目仅依赖 food-model
,只需使用 FoodModel.model
。我们得到以下好处:
Every downstream project now only depends on food-model
and simply uses FoodModel.model
. We get the benefit of:
- 这是在运行时从JAR的
打包资源中快速静态加载的 - 无需在运行时训练模型(非常贵b $ b)
- 无需在您的版本
控件中签入模型(再次二进制模型非常大) - 它只打包到你的
JAR - 无需分开
FoodTrainer
和FoodModel
打包到他们自己的JAR中(我们现在头痛地在内部部署它们) - 相反,我们只是将它们保存在同一个
项目中但不同子模块被打包到一个JAR中。
- This being statically loaded fast at runtime from the JAR's packaged resources
- No need to train the model at runtime (very expensive)
- No need to checking-in the model in your version control (again the binary model is very big) - it is only packaged into your JAR
- No need to separate the
FoodTrainer
andFoodModel
packages into their own JARs (we have the headache of deploying them internally now) - instead we simply keep them in the same project but different sub-modules which gets packed into a single JAR.
这篇关于SBT:如何将类的实例打包为JAR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!