Spark --Error:类型不匹配;找到:(整数,字符串)必需:TraversableOnce [?] [英] Spark --Error :type mismatch; found : (Int, String) required: TraversableOnce[?]
问题描述
我是火花编程和Scala的新手,我无法理解map和flatMap之间的区别.使用flatMap时,为什么在方法中使用"Option"可以正常工作
I am new to spark programming and scala and I am not able to understand the difference between map and flatMap. While using flatMap, why is "Option" used in method is working fine
def parseNames(line: String) : Option[(Int,String)] = {
var fields = line.split('\"')
if (fields.length >1) {
return Some(fields(0).trim().toInt,fields(1) )
}
else {
return None
}
}
def main(args: Array[String]) {
val sc = new SparkContext("local[*]","DemoHero")
val txt= sc.textFile("../marvel-names1.txt")
val rdd = txt.flatMap(parseNames)
但没有选项",则会出现错误:
but without "Option", an error is coming:
def parseNames(line: String) : (Int, String) = {
var fields = line.split('\"')
(fields(0).trim().toInt,fields(1) )
}
def main(args: Array[String]) {
val sc = new SparkContext("local[*]","DemoHero")
val txt= sc.textFile("../marvel-names1.txt")
val rdd = txt.flatMap(parseNames)
据我了解,平面图使Rdd成为String/Int Rdd的集合.我以为在这种情况下,两者都应该正常工作.请让我知道我在哪里出错.
As per my understanding, flatmap make Rdd in to collection for String/Int Rdd. I was thinking that in this case both should work without any error. Please let me know where I am making the mistake.
推荐答案
TL; DR:从 Option
到 Iterable
有一个隐式转换,这就是为什么您第一个 flatMap
不会失败.
TL;DR: There is an implicit conversion from Option
to Iterable
, this is why your first
flatMap
does not fail.
来自 Option的继承体系
尚不清楚为什么RDD的
From the inheritance hierarchy of Option
it is not at all clear why RDD's flatMap
that expects an
argument with TraversableOnce
in return type would accept a function that returns an Option
, because
Option
does not extend TraversableOnce
.
但是,如果打印由 flatMap
生成的已删除密码,则会显示以下综合功能定义:
However, if you print the desugared code generated by your flatMap
, the following synthetic function definition appears:
@SerialVersionUID(value = 0) final <synthetic> class anonfun$1 extends scala.runtime.AbstractFunction1 with Serializable {
final def apply(line: String): Iterable = scala.this.Option.option2Iterable(org.example.ClassName.parseNames$1(line));
final <bridge> <artifact> def apply(v1: Object): Object = anonfun$1.this.apply(v1.$asInstanceOf[String]());
def <init>(): <$anon: Function1> = {
anonfun$1.super.<init>();
()
}
}
细节不是那么重要,它需要一行 line:String
并返回一个 Iterable
.有趣的是 Option.option2Iterable
部分.
The details are not that important, it's some thing that takes a line: String
and returns an Iterable
.
What's interesting is the Option.option2Iterable
part.
这是直接在上定义的隐式转换选项,它悄悄地将选项转换为 Iterable
,而 Iterable
是 TraversableOnce
的特例.
This is an implicit conversion defined directly on Option,
it quietly converts options into Iterable
, and Iterable
is a special case of TraversableOnce
.
这是编译器可以将 option2Iterable
嵌入到综合的 Function
-definition中的方式在您的方法和 flatMap
的调用之间进行中介.现在你有一个论点输入 String =>Iterable [(Int,String)]
,因此 flatMap
可以很好地编译.
This is how the compiler can sneak in the option2Iterable
into a synthetic Function
-definition
that mediates between your method and the invocation of flatMap
. Now you have an argument of
type String => Iterable[(Int, String)]
, so the flatMap
compiles fine.
请注意,没有包装您的方法的综合 Function
-instance它将无法正常工作.如果您这样声明 parseNames
:
Note that it wouldn't work without a synthetic Function
-instance that wraps your method. If you declared parseNames
like this:
def parseNames: String => Option[(Int,String)] = { line =>
这将是直接的编译器错误.
this would be a straightforward compiler error.
您的第二个代码段不应该编译,而且幸运的是,它不是:对不是 Traversable
,所以 flatMap
不接受 parseNames(line:String):(Int,String)
作为参数.您要在这里使用的是 map
,因为您想将每个字符串 map 精确地映射到一对(Int,String)
.
Your second code snippet shouldn't compile, and luckily, it indeed doesn't: pairs are not Traversable
, so
flatMap
does not accept a parseNames(line: String) : (Int, String)
as argument. What you want to use here is
map
, because you want to map each string to exactly one pair of (Int, String)
.
flatMap
用于不同的用例:用于将原始集合中的每个元素转换为另一个集合,然后将所有生成的集合展平到一个集合中,例如,
The flatMap
is for a different use case: it's for converting each element in your original collection into
another collection, and then flattening out all resulting collections into a single collection, so, for example,
sc.parallelize(List(1, 2, 3)).flatMap{ x => List(x, x*x, x*x*x) }
首先将为每个 x
生成一个 TraversableOnce
:
would first produce a TraversableOnce
for each x
:
List(1,1,1)
List(2,4,8)
List(3,9,27)
然后将它们粘合在一起,以便获得带有条目的RDD
and then glue them all together, so that you would obtain an RDD with entries
1,1,1,2,4,8,3,9,27
它以相同的方式与 Option
一起使用,因为道德上"它类似于具有0到1个元素的列表,即使它没有在继承层次结构中明确指出
It works with Option
in the same way, because "morally" it is something like a list with 0-to-1 elements, even though it doesn't say that explicitly in its inheritance hierarchy.
关于不应编译"的提法的通知:每当我(您的代码或其他代码)写为不应编译"时,我并不是说我通常希望您在您的代码中有编译错误.我的意思是如果代码中有问题,编译器应尽快产生清晰的错误消息.
Notice about the formulation "should not compile": whenever I write that (your code, or some other code) "should not compile", I don't mean that I generally wish that you had compile errors in your code. What I do mean is that if there is some problem in the code, the compiler should produce a clear error message as soon as possible.
这篇关于Spark --Error:类型不匹配;找到:(整数,字符串)必需:TraversableOnce [?]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!