snakeyaml和spark导致无法构造对象 [英] snakeyaml and spark results in an inability to construct objects

查看:252
本文介绍了snakeyaml和spark导致无法构造对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用snakeyaml版本1.17的情况下,以下代码可以在scala外壳中正常运行

The following code executes fine in a scala shell given snakeyaml version 1.17

import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.constructor.Constructor
import scala.collection.mutable.ListBuffer
import scala.beans.BeanProperty

class EmailAccount {
  @scala.beans.BeanProperty var accountName: String = null

  override def toString: String = {
    return s"acct ($accountName)"
  }
}

val text = """accountName: Ymail Account"""

val yaml = new Yaml(new Constructor(classOf[EmailAccount]))
val e = yaml.load(text).asInstanceOf[EmailAccount]
println(e)

但是,当在spark中运行(在这种情况下为2.0.0)时,产生的错误是:

However when running in spark (2.0.0 in this case) the resulting error is:

org.yaml.snakeyaml.constructor.ConstructorException: Can't construct a java object for tag:yaml.org,2002:EmailAccount; exception=java.lang.NoSuchMethodException: EmailAccount.<init>()
 in 'string', line 1, column 1:
    accountName: Ymail Account
    ^

  at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:350)
  at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
  at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
  at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
  at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
  at org.yaml.snakeyaml.Yaml.load(Yaml.java:369)
  ... 48 elided
Caused by: org.yaml.snakeyaml.error.YAMLException: java.lang.NoSuchMethodException: EmailAccount.<init>()
  at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:220)
  at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:190)
  at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:346)
  ... 53 more
Caused by: java.lang.NoSuchMethodException: EmailAccount.<init>()
  at java.lang.Class.getConstructor0(Class.java:2810)
  at java.lang.Class.getDeclaredConstructor(Class.java:2053)
  at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:216)
  ... 55 more

我使用

scala -classpath "/home/placey/snakeyaml-1.17.jar"

我用以下方式发射了火花壳

I launched the spark shell with

/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-shell --master local --jars /home/placey/snakeyaml-1.17.jar

推荐答案

解决方案

创建一个自包含的应用程序,然后使用spark-submit而不是spark-shell来运行它.

Solution

Create a self-contained application and run it using spark-submit instead of using spark-shell.

我已经为您创建了一个最小项目,作为要点在这里.您需要做的就是将两个文件(build.sbtMain.scala)都放在某个目录中,然后运行:

I've created a minimal project for you as a gist here. All you need to do is put both files (build.sbt and Main.scala) in some directory, then run:

sbt package

以创建一个JAR. JAR将位于target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar或类似位置.如果尚未使用SBT,则可以从此处获取SBT .最后,您可以运行项目:

in order to create a JAR. The JAR will be in target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar or a similar location. You can get SBT from here if you haven't used it yet. Finally, you can run the project:

/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --class "Main" --master local --jars /home/placey/snakeyaml-1.17.jar target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar

输出应为:

[many lines of Spark's log)]
acct (Ymail Account)
[more lines of Spark's log)]

说明

Spark的外壳( REPL )通过在构造函数中添加$iw参数来转换您在其中定义的所有类.我已经在这里进行了解释. SnakeYAML希望为类似于JavaBean的类提供零参数的构造函数,但是没有一个构造函数,因此它失败了.

Explanation

Spark's shell (REPL) transforms all classes you define in it by adding $iw parameter to your constructors. I've explained it here. SnakeYAML expects a zero-parameter constructor for JavaBean-like classes, but there isn't one, so it fails.

您可以自己尝试:

scala> class Foo() {}
defined class Foo

scala> classOf[Foo].getConstructors()
res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw))

scala> classOf[Foo].getConstructors()(0).getParameterCount
res1: Int = 1

如您所见,Spark通过添加类型为$iw的参数来转换构造函数.

As you can see, Spark transforms the constructor by adding a parameter of type $iw.

如果确实需要在外壳中运行它,则可以定义实现org.yaml.snakeyaml.constructor.BaseConstructor的自己的类,并确保将$iw传递给构造函数,但这是很多工作(我实际上写了我自己的出于安全原因,Constructor在Scala中使用了一段时间,所以我对此有一些经验.

If you really need to get it working in the shell, you could define your own class implementing org.yaml.snakeyaml.constructor.BaseConstructor and make sure that $iw gets passed to constructors, but this is a lot of work (I actually wrote my own Constructor in Scala for security reasons some time ago, so I have some experience with this).

您还可以定义一个自定义的Constructor硬编码,以实例化特定的类(在您的情况下为EmailAccount),类似于DiceConstructor

You could also define a custom Constructor hard-coded to instantiate a specific class (EmailAccount in your case) similar to the DiceConstructor shown in SnakeYAML's documentation. This is much easier, but requires writing code for each class you want to support.

示例:

case class EmailAccount(accountName: String)

class EmailAccountConstructor extends org.yaml.snakeyaml.constructor.Constructor {

  val emailAccountTag = new org.yaml.snakeyaml.nodes.Tag("!emailAccount")
  this.rootTag = emailAccountTag
  this.yamlConstructors.put(emailAccountTag, new ConstructEmailAccount)

  private class ConstructEmailAccount extends org.yaml.snakeyaml.constructor.AbstractConstruct {
    def construct(node: org.yaml.snakeyaml.nodes.Node): Object = {
      // TODO: This is fine for quick prototyping in a REPL, but in a real
      //       application you should probably add type checks.
      val mnode = node.asInstanceOf[org.yaml.snakeyaml.nodes.MappingNode]
      val mapping = constructMapping(mnode)
      val name = mapping.get("accountName").asInstanceOf[String]
      new EmailAccount(name)
    }
  }

}

您可以将其另存为文件,然后使用:load filename.scala将其加载到REPL中.

You can save this as a file and load it in the REPL using :load filename.scala.

此解决方案的优点是它可以直接创建不可变的case类实例.不幸的是,Scala REPL似乎与导入有关,因此我使用了完全限定的名称.

Bonus advantage of this solution is that it can create immutable case class instances directly. Unfortunately Scala REPL seems to have issues with imports, so I've used fully qualified names.

您还可以将YAML文档解析为简单的Java映射:

You can also just parse YAML documents as simple Java maps:

scala> val yaml2 = new Yaml()
yaml2: org.yaml.snakeyaml.Yaml = Yaml:1141996301

scala> val e2 = yaml2.load(text)
e2: Object = {accountName=Ymail Account}

scala> val map = e2.asInstanceOf[java.util.Map[String, Any]]
map: java.util.Map[String,Any] = {accountName=Ymail Account}

scala> map.get("accountName")
res4: Any = Ymail Account

通过这种方式,SnakeYAML不需要使用反射.

This way SnakeYAML won't need to use reflection.

但是,由于您使用的是Scala,因此建议您尝试 MoultingYAML ,它是SnakeYAML的Scala包装器.它将YAML文档解析为简单的Java类型,然后将它们映射到Scala类型(甚至是您自己的类型,例如EmailAccount).

However, since you're using Scala, I recommend trying MoultingYAML, which is a Scala wrapper for SnakeYAML. It parses YAML documents to simple Java types and then maps them to Scala types (even your own types like EmailAccount).

这篇关于snakeyaml和spark导致无法构造对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆