星火ML管道API保存不工作 [英] Spark ML Pipeline api save not working

查看：911 发布时间：2016/5/22 16:08:48 java apache-spark apache-spark-ml

本文介绍了星火ML管道API保存不工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在1.6版本管道API得到了一套新的功能，以保存和加载流水线阶段。我试图舞台保存到硬盘后，我训练的分类器和后再次将其装入重用并保存计算重新建模的工作。

in version 1.6 the pipeline api got a new set of features to save and load pipeline stages. I tried to save a stage to disk after I trained a classifier and load it later again to reuse it and save the effort to compute to model again.

由于某些原因，当我保存模型，该目录仅包含元数据目录。当我尝试再次加载它，我得到了以下异常：

For some reason when I save the model, the directory only contains the metadata directory. When I try to load it again I get the following exception:

异常线程mainjava.lang.UnsupportedOperationException：
  在空集
  org.apache.spark.rdd.RDD $$ anonfun $首$ 1.适用（RDD.scala：1330）在
  org.apache.spark.rdd.RDDOperationScope $ .withScope（RDDOperationScope.scala：150）
    在
  org.apache.spark.rdd.RDDOperationScope $ .withScope（RDDOperationScope.scala：111）
    在org.apache.spark.rdd.RDD.withScope（RDD.scala：316）在
  org.apache.spark.rdd.RDD.first（RDD.scala：1327）在
  org.apache.spark.ml.util.DefaultParamsReader $ .loadMetadata（ReadWrite.scala：284）
    在
  org.apache.spark.ml.tuning.CrossValidator $ SharedReadWrite $ .load（CrossValidator.scala：287）
    在
  org.apache.spark.ml.tuning.CrossValidatorModel$CrossValidatorModelReader.load(CrossValidator.scala:393)
    在
  org.apache.spark.ml.tuning.CrossValidatorModel$CrossValidatorModelReader.load(CrossValidator.scala:384)
    在
  org.apache.spark.ml.util.MLReadable $ class.load（ReadWrite.scala：176）
    在
  org.apache.spark.ml.tuning.CrossValidatorModel $ .load（CrossValidator.scala：368）
    在
  org.apache.spark.ml.tuning.CrossValidatorModel.load（CrossValidator.scala）
    在
  org.test.categoryminer.spark.SparkTextClassifierModelCache.get(SparkTextClassifierModelCache.java:34)

Exception in thread "main" java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1330) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.first(RDD.scala:1327) at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:284) at org.apache.spark.ml.tuning.CrossValidator$SharedReadWrite$.load(CrossValidator.scala:287) at org.apache.spark.ml.tuning.CrossValidatorModel$CrossValidatorModelReader.load(CrossValidator.scala:393) at org.apache.spark.ml.tuning.CrossValidatorModel$CrossValidatorModelReader.load(CrossValidator.scala:384) at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:176) at org.apache.spark.ml.tuning.CrossValidatorModel$.load(CrossValidator.scala:368) at org.apache.spark.ml.tuning.CrossValidatorModel.load(CrossValidator.scala) at org.test.categoryminer.spark.SparkTextClassifierModelCache.get(SparkTextClassifierModelCache.java:34)

救我使用模型： crossValidatorModel.save（/ tmp目录/ my.model）

和加载它，我用： CrossValidatorModel.load（/ tmp目录/ my.model）

我打电话保存CrossValidatorModel对象，我得到当我CrossValidator对象调用合适（数据帧）上。

I call save on the CrossValidatorModel object I get when I call fit(dataframe) on the CrossValidator object.

任何指针为什么只保存元数据目录？

Any pointer why it only saves the metadata directory?

推荐答案

这当然不会直接回答你的问题，但我个人并没有在1.6.0测试新功能。

This will certainly not answer your question directly, but personally I didn't test the new feature in 1.6.0.

我使用的是专用的功能来保存模型。

I am using a dedicated function to save the models.

  def saveCrossValidatorModel(model:CrossValidatorModel, path:String)
  {
    try {
          val fileOut:FileOutputStream  = new FileOutputStream(path)
          val out:ObjectOutputStream  = new ObjectOutputStream(fileOut)
          out.writeObject(model)
          out.close()
          fileOut.close()
      } catch {
        case foe:FileNotFoundException =>
          foe.printStackTrace()
        case ioe:IOException =>
          ioe.printStackTrace()
      }
  }

和则可以读取你的模型以类似的方式：

And you can then read your model in a similar way:

  def loadCrossValidatorModel(path:String): CrossValidatorModel =
  {
    try {
      val fileIn:FileInputStream = new FileInputStream(path)
      val in:ObjectInputStream  = new ObjectInputStream(fileIn)
      val cvModel = in.readObject().asInstanceOf[CrossValidatorModel]
      in.close()
      fileIn.close()
      cvModel
    } catch {
        case foe:FileNotFoundException =>
          foe.printStackTrace()
        case ioe:IOException =>
          ioe.printStackTrace()
      }
  }

这篇关于星火ML管道API保存不工作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

星火ML管道API保存不工作 [英] Spark ML Pipeline api save not working

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

星火ML管道API保存不工作 [英] Spark ML Pipeline api save not working

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭