通过参数名称替换具体值时出现无限循环 [英] Infinite loop when replacing concrete value by parameter name

查看:225
本文介绍了通过参数名称替换具体值时出现无限循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两个对象(在scala中并使用spark): 1.主要对象

I have the two following objects (in scala and using spark): 1. The main object

object Omain {
  def main(args: Array[String]) {
    odbscan
  }
}

2.对象odbscan

2. The object odbscan

object odbscan {
  val conf = new SparkConf().setAppName("Clustering").setMaster("local")
  conf.set("spark.driver.maxResultSize", "3g")
  val sc = new SparkContext(conf)

  val param_user_minimal_rating_count = 2

  /***Connexion***/
  val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
  val sql = "SELECT id, data FROM user_profile"
  val options = connectMysql.getOptionsMap(sql)
  val uSQL = sqlcontext.load("jdbc", options)

  val users = uSQL.rdd.map { x =>
    val v = x.toString().substring(1, x.toString().size - 1).split(",")
    var ap: Map[Int, Double] = Map()
    if (v.size > 1)
       ap = v(1).split(";").map { y => (y.split(":")(0).toInt, y.split(":")(1).toDouble) }.toMap
    (v(0).toInt, ap)
  }.filter(_._2.size >= param_user_minimal_rating_count) 
  println(users.collect().mkString("\n"))
}

执行此代码时,我得到一个无限循环,直到更改为止:

When I execute this code I obtain an infinite loop, until I change:

过滤器(_._ 2.size> = param_user_minimal_rating_count)

filter(_._2.size >= param_user_minimal_rating_count)

过滤器(_._ 2.size> = 1)

filter(_._2.size >= 1)

或其他任何数值,在这种情况下,代码都可以正常工作,并且显示了我的结果

or any other numerical value, in this case the code work, and I have my result displayed

推荐答案

认为在这里发生的事情是Spark序列化函数以通过电线发送它们.而且因为您的函数(传递给map的函数)调用了对象odbscan的访问器param_user_minimal_rating_count,所以整个对象odbscan将需要进行序列化并与之一起发送.反序列化然后使用该反序列化的对象将导致其体内的代码再次执行,这将导致无限循环的序列化->发送->反序列化->执行->序列化-> ...

What I think is happening here is that Spark serializes functions to send them over the wire. And that because your function (the one you're passing to map) calls the accessor param_user_minimal_rating_count of object odbscan, the entire object odbscan will need to get serialized and sent along with it. Deserializing and then using that deserialized object will cause the code in its body to get executed again which causes an infinite loop of serializing-->sending-->deserializing-->executing-->serializing-->...

在这里最简单的操作可能是将val更改为final val param_user_minimal_rating_count = 2,这样编译器将内联该值.但是请注意,这仅是文字常量的解决方案.有关更多信息,请参见常量值定义常量表达式.

Probably the easiest thing to do here is changing that val to final val param_user_minimal_rating_count = 2 so the compiler will inline the value. But note that this will only be a solution for literal constants. For more information see constant value definitions and constant expressions.

另一个更好的解决方案是重构代码,以便在lambda表达式中不使用实例变量.引用在对象或类中定义的val将使整个对象序列化.因此,请尝试仅引用本地(对于方法)的val.最重要的是,不要从构造函数/对象或类的主体中执行业务逻辑.

An other and better solution would be to refactor your code so that no instance variables are used in lambda expressions. Referencing vals that are defined in an object or class will get the whole object serialized. So try to only refer to vals that are local (to a method). And most importantly don't execute your business logic from within a constructor/the body of an object or class.

这篇关于通过参数名称替换具体值时出现无限循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆