有没有办法将pmml文件导入python? [英] Is there a way to import a pmml file into python?

查看:139
本文介绍了有没有办法将pmml文件导入python?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用sklearn训练了一个模型,并使用sklearn2pmml将其导出为pmml格式.有没有办法将该pmml文件转换回可以导入并在python中运行的文件?

I have trained a model using sklearn and exported it into a pmml format using sklearn2pmml. Is there a way to convert that pmml file back into something that can be imported and run in python?

我要这样做的原因是因为我注意到pmml模型的行为方式与sklearn模型相比略有差异.具体来说,pmml文件为变量设置了硬上限和下限(使用训练集中变量的最大值和最小值),而sklearn则没有.当pmml模型遇到超出这些范围的数据时,我会遇到问题.这只是pmml模型和sklearn模型之间的一个区别,我希望能够将pmml文件重新导入到python中以运行它,并查看是否还有其他文件.

The reason I am looking to do this is because I have noticed slight differences in the way the pmml model behaves compared to the sklearn model. Specifically, the pmml file sets hard upper and lower bounds for variables (uses the max and min of the variable in the training set) whereas sklearn does not. I encounter problems when the pmml model encounters data that is outside of these bounds. This is just one difference between the pmml model and the sklearn model and I want to be able to re-import the pmml file into python to run it and see if there are any others.

推荐答案

您无需测试sklearn2pmml生成的模型的正确性.它基于 JPMML-SkLearn 库,该库全面介绍了集成测试-Scikit-Learn预测与PMML预测完全相同.

You don't need to test the correctness of sklearn2pmml generated models. It's based on the JPMML-SkLearn library, which has full coverage with integration tests - Scikit-Learn predictions and PMML predictions are provably identical.

您的实际问题是您想在模型的预期适用性域"之外应用模型.这是一个念头,因为在这种情况下未指定模型的行为-垃圾输入,垃圾预测.

Your real issue is that you want to apply models outside of their intended "applicability domain". It's a bead idea, because model's behaviour is not specified in that case - garbage input, garbage predictions.

但是,如果您坚持必须在生产环境中向模型提供垃圾信息,则只需禁用PMML值界限检查.有多种方法可以实现此目的:

However, if you insist that you must be able to feed garbage to your models in production environment, then simply disable PMML value bounds checking. There are many ways how this can be accomplished:

  1. /PMML/DataDictionary/DataField元素中删除ValueInterval子元素.
  2. 修改ValueInterval子元素,以便将那些以前看不见的值识别为有效值.例如,您可以定义Input元素的边距以包括所有值[-Inf,+ Inf].请参见 Value invalidValueTreatment=asIs.
  1. Remove Value and Interval child elements from /PMML/DataDictionary/DataField elements.
  2. Modify Value and Interval child elements so that those previously unseen values would be recognized as valid values. For example, you can define the margins of the Input element to include all values [-Inf, +Inf]. See the explanation of Value and Interval elements in the PMML specification for correct syntax.
  3. Change the invalidValueTreatment attribute value of all /PMML/<Model>/MiningSchema/MiningField elements from "returnInvalid" to "asIs". If this attribute is missing, then it defaults to "returnInvalid". So you'd need to insert invalidValueTreatment=asIs there.

我建议选择#3.您可以使用 JPMML模型库自动完成该过程:

I would recommend option #3. You can automate the process using JPMML-Model library:

org.dmg.pmml.PMML pmml = loadFromFile(..)
org.dmg.pmml.Visitor mfUpdater = new org.jpmml.model.visitors.AbstractVisitor(){
  @Override
  public VisitorAction visit(MiningField miningField){
    miningField.setInvalidValueTreatment(InvalidValueTreatmentMethod.AS_IS);
    return VisitorAction.CONTINUE;
  }
}
mfUpdater.applyTo(pmml);
saveToFile(pmml, ...)

这篇关于有没有办法将pmml文件导入python?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆