Spark --Error:类型不匹配;找到:(整数,字符串)必需:TraversableOnce [?] [英] Spark --Error :type mismatch; found : (Int, String) required: TraversableOnce[?]

查看:98
本文介绍了Spark --Error:类型不匹配;找到:(整数,字符串)必需:TraversableOnce [?]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是火花编程和Scala的新手,我无法理解map和flatMap之间的区别.使用flatMap时,为什么在方法中使用"Option"可以正常工作

I am new to spark programming and scala and I am not able to understand the difference between map and flatMap. While using flatMap, why is "Option" used in method is working fine

def parseNames(line: String) : Option[(Int,String)]  = {
  var fields = line.split('\"')
  if (fields.length >1) {
    return Some(fields(0).trim().toInt,fields(1) )
  }
  else {
    return None
  }
}
def main(args: Array[String]) {
  val sc = new SparkContext("local[*]","DemoHero")
  val txt= sc.textFile("../marvel-names1.txt")  
  val rdd = txt.flatMap(parseNames)

但没有选项",则会出现错误:

but without "Option", an error is coming:

def parseNames(line: String) : (Int, String)  = {
  var fields = line.split('\"')    
  (fields(0).trim().toInt,fields(1) )
}

def main(args: Array[String]) {
  val sc = new SparkContext("local[*]","DemoHero")   
  val txt= sc.textFile("../marvel-names1.txt")  
  val rdd = txt.flatMap(parseNames)

据我了解,平面图使Rdd成为String/Int Rdd的集合.我以为在这种情况下,两者都应该正常工作.请让我知道我在哪里出错.

As per my understanding, flatmap make Rdd in to collection for String/Int Rdd. I was thinking that in this case both should work without any error. Please let me know where I am making the mistake.

推荐答案

TL; DR:从 Option Iterable 有一个隐式转换,这就是为什么您第一个 flatMap 不会失败.

TL;DR: There is an implicit conversion from Option to Iterable, this is why your first flatMap does not fail.

来自 Option的继承体系 尚不清楚为什么RDD的

From the inheritance hierarchy of Option it is not at all clear why RDD's flatMap that expects an argument with TraversableOnce in return type would accept a function that returns an Option, because Option does not extend TraversableOnce.

但是,如果打印由 flatMap 生成的已删除密码,则会显示以下综合功能定义:

However, if you print the desugared code generated by your flatMap, the following synthetic function definition appears:

@SerialVersionUID(value = 0) final <synthetic> class anonfun$1 extends scala.runtime.AbstractFunction1 with Serializable {
  final def apply(line: String): Iterable = scala.this.Option.option2Iterable(org.example.ClassName.parseNames$1(line));
  final <bridge> <artifact> def apply(v1: Object): Object = anonfun$1.this.apply(v1.$asInstanceOf[String]());
  def <init>(): <$anon: Function1> = {
    anonfun$1.super.<init>();
    ()
  }
}

细节不是那么重要,它需要一行 line:String 并返回一个 Iterable .有趣的是 Option.option2Iterable 部分.

The details are not that important, it's some thing that takes a line: String and returns an Iterable. What's interesting is the Option.option2Iterable part.

这是直接在上定义的隐式转换选项,它悄悄地将选项转换为 Iterable ,而 Iterable TraversableOnce 的特例.

This is an implicit conversion defined directly on Option, it quietly converts options into Iterable, and Iterable is a special case of TraversableOnce.

这是编译器可以将 option2Iterable 嵌入到综合的 Function -definition中的方式在您的方法和 flatMap 的调用之间进行中介.现在你有一个论点输入 String =>Iterable [(Int,String)] ,因此 flatMap 可以很好地编译.

This is how the compiler can sneak in the option2Iterable into a synthetic Function-definition that mediates between your method and the invocation of flatMap. Now you have an argument of type String => Iterable[(Int, String)], so the flatMap compiles fine.

请注意,没有包装您的方法的综合 Function -instance它将无法正常工作.如果您这样声明 parseNames :

Note that it wouldn't work without a synthetic Function-instance that wraps your method. If you declared parseNames like this:

def parseNames: String => Option[(Int,String)] = { line => 

这将是直接的编译器错误.

this would be a straightforward compiler error.

您的第二个代码段不应该编译,而且幸运的是,它不是:对不是 Traversable ,所以 flatMap 不接受 parseNames(line:String):(Int,String)作为参数.您要在这里使用的是 map ,因为您想将每个字符串 map 精确地映射到一对(Int,String).

Your second code snippet shouldn't compile, and luckily, it indeed doesn't: pairs are not Traversable, so flatMap does not accept a parseNames(line: String) : (Int, String) as argument. What you want to use here is map, because you want to map each string to exactly one pair of (Int, String).

flatMap 用于不同的用例:用于将原始集合中的每个元素转换为另一个集合,然后将所有生成的集合展平到一个集合中,例如,

The flatMap is for a different use case: it's for converting each element in your original collection into another collection, and then flattening out all resulting collections into a single collection, so, for example,

sc.parallelize(List(1, 2, 3)).flatMap{ x => List(x, x*x, x*x*x) }

首先将为每个 x 生成一个 TraversableOnce :

would first produce a TraversableOnce for each x:

List(1,1,1)
List(2,4,8)
List(3,9,27)

然后将它们粘合在一起,以便获得带有条目的RDD

and then glue them all together, so that you would obtain an RDD with entries

1,1,1,2,4,8,3,9,27

它以相同的方式与 Option 一起使用,因为道德上"它类似于具有0到1个元素的列表,即使它没有在继承层次结构中明确指出

It works with Option in the same way, because "morally" it is something like a list with 0-to-1 elements, even though it doesn't say that explicitly in its inheritance hierarchy.

关于不应编译"的提法的通知:每当我(您的代码或其他代码)写为不应编译"时,我并不是说我通常希望您在您的代码中有编译错误.我的意思是如果代码中有问题,编译器应尽快产生清晰的错误消息.

Notice about the formulation "should not compile": whenever I write that (your code, or some other code) "should not compile", I don't mean that I generally wish that you had compile errors in your code. What I do mean is that if there is some problem in the code, the compiler should produce a clear error message as soon as possible.

这篇关于Spark --Error:类型不匹配;找到:(整数,字符串)必需:TraversableOnce [?]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆