如何`map`和`reduce`方法星火RDDS工作？ [英] How do `map` and `reduce` methods work in Spark RDDs?

查看：180 发布时间：2016/5/22 16:04:41 scala apache-spark closures

本文介绍了如何`map`和`reduce`方法星火RDDS工作？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

继code是来自Apache星火的快速入门指南。
有人可以解释我什么是线变量，它从何而来？

  textFile.map（行=＆GT; line.split（）.size）。降低（（A，B）=＆GT;如果（A＆GT; b）在其他b）

此外，如何一个值获得通过分为A，B？

链接到QSG http://spark.apache.org/docs/最新的/快速的start.html

解决方案

首先，根据你的链接时，文本文件创建为

  VAL TEXTFILE = sc.textFile（README.md）

这样文本文件是 RDD [字符串] 这意味着它是一个类型的弹性分布式数据集字符串。访问的API是非常相似的，经常Scala集合的。

现在怎么做到这一点`地图`吗？

假设你有字符串秒的列表，并希望将其转换成整数的列表，再presenting每个字符串的长度。

  VAL的StringList：列表[字符串] =表（AB，CDE，F）
VAL intList中：列表[INT] = stringlist.map（X =＆GT; x.length）

的地图方法需要的功能。一个函数，从字符串=＆GT就去。诠释。与该功能，该列表中的每个元素被变换。所以intList中的值为一览（2，3，1）

下面，我们创建从字符串= GT匿名函数;诠释。这就是 X =＆GT; x.length 。人们甚至可以写出功能更明确为

  stringlist.map（（X：字符串）=＆GT; x.length）

如果你使用上面写明确的，你可以

  VAL stringLength：（字符串=＆GT;强度）= {
  X =＆GT; x.length
}
VAL intList中= stringlist.map（stringLength）

所以，这里绝对是显而易见的，那stringLength从字符串到内部的功能。

备注：在一般情况下，地图是什么使了一个所谓的仿函数。当你从函子（在这里列出）的A => B，地图提供了一个功能，您可以使用该功能还可从列表[A去] =＆GT;列表[B] 。这就是所谓的升降

回答您的问题

什么是行变量？

如前所述，行是函数行=＆gt;中输入参数; line.split（）.size
更明确
（行：字符串）=＆GT; line.split（）.size
例如：如果行是世界你好，该函数返回2
。
 世界你好
= GT;阵列（你好，世界）//分裂
= GT; 2个达阵//大小
 
如何做一个价值获得通过分为A，B？

减少还预计，从（A，A）=＆GT的函数; A ，其中 A 是类型的 RDD 。让我们调用这个函数运。
这是什么减少。例如：
 列表（1，2，3，4）。降低（（X，Y）=＆GT; X + Y）
步骤1：运算（1，2）将第一次评估。
  用1,2开始，即
    x是1，y是2
步骤2：运算（运算（1，2），3） - 采取下一个元件3
  采取下一个元素三：
    x是运算（1,2）= 3且y = 3
步骤3：运算（运算（运算（1，2），3），4）
  采取下一个元素四：
    x是运算（运算（1,2），3）=运算（3,3）= 6，y是4
 
这里结果是列表中的元素的总和，10
备注：在一般减少计算
 运算（OP（... OP（X_1，X_2）...，X_ {N-1}），x_n）
 
完整的例子
首先，文本文件是一个RDD [字符串]，说
 文本文件
 你好Tyth
 酷例如，对吧？
 再见TextFile.map（线= GT; line.split（）.size）
 2
 3
 1
TextFile.map（线= GT; line.split（）.size）。降低（（A，B）=＆GT;如果（一个或GT; b）一种别的二）
 3
   步骤这里，召回`（A，B）=＆GT;如果（A＆GT; b）在其他B）`
    - 运算（OP（2，3），1）的计算结果为运算（3，1），因为运算（2,3）= 3
    -  OP（3,1）= 3
 
Following code is from the quick start guide of Apache Spark. Can somebody explain me what is the "line" variable and where it comes from?
textFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)
Also, how does a value get passed into a,b?

Link to the QSG http://spark.apache.org/docs/latest/quick-start.html
解决方案
First, according to your link, the textfile is created as
val textFile = sc.textFile("README.md")
such that textfile is a RDD[String] meaning it is a resilient distributed dataset of type String. The API to access is very similar to that of regular Scala collections.

So now what does this map do?

Imagine you have a list of Strings and want to convert that into a list of Ints, representing the length of each String.
val stringlist: List[String] = List("ab", "cde", "f")
val intlist: List[Int] = stringlist.map( x => x.length )
The map method expects a function. A function, that goes from String => Int. With that function, each element of the list is transformed. So the value of intlist is List( 2, 3, 1 )

Here, we have created an anonymous function from String => Int. That is x => x.length. One can even write the function more explicit as
stringlist.map( (x: String) => x.length )  
If you do use write the above explicit, you can
val stringLength : (String => Int) = {
  x => x.length
}
val intlist = stringlist.map( stringLength )
So, here it is absolutely evident, that stringLength is a function from String to Int.

Remark: In general, map is what makes up a so called Functor. While you provide a function from A => B, map of the functor (here List) allows you use that function also to go from List[A] => List[B]. This is called lifting.

Answers to your questions

What is the "line" variable?

As mentioned above, line is the input parameter of the function line => line.split(" ").size

More explicit (line: String) => line.split(" ").size

Example: If line is "hello world", the function returns 2.
"hello world" 
=> Array("hello", "world")  // split 
=> 2                        // size of Array
How does a value get passed into a,b?

reduce also expects a function from (A, A) => A, where A is the type of your RDD. Lets call this function op.

What does reduce. Example:
List( 1, 2, 3, 4 ).reduce( (x,y) => x + y )
Step 1 : op( 1, 2 ) will be the first evaluation. 
  Start with 1, 2, that is 
    x is 1  and  y is 2
Step 2:  op( op( 1, 2 ), 3 ) - take the next element 3
  Take the next element 3: 
    x is op(1,2) = 3   and y = 3
Step 3:  op( op( op( 1, 2 ), 3 ), 4) 
  Take the next element 4: 
    x is op(op(1,2), 3 ) = op( 3,3 ) = 6    and y is 4
Result here is the sum of the list elements, 10.

Remark: In general reduce calculates
op( op( ... op(x_1, x_2) ..., x_{n-1}), x_n)
Full example

First, textfile is a RDD[String], say
TextFile
 "hello Tyth"
 "cool example, eh?"
 "goodbye"

TextFile.map(line => line.split(" ").size)
 2
 3
 1
TextFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)
 3
   Steps here, recall `(a, b) => if (a > b) a else b)`
   - op( op(2, 3), 1) evaluates to op(3, 1), since op(2, 3) = 3 
   - op( 3, 1 ) = 3
这篇关于如何`map`和`reduce`方法星火RDDS工作？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何`map`和`reduce`方法星火RDDS工作？ [英] How do `map` and `reduce` methods work in Spark RDDs?

问题描述

现在怎么做到这一点`地图`吗？

回答您的问题

完整的例子

So now what does this `map` do?

Answers to your questions

Full example

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何`map`和`reduce`方法星火RDDS工作？ [英] How do `map` and `reduce` methods work in Spark RDDs?

问题描述

现在怎么做到这一点地图吗？

回答您的问题

完整的例子

So now what does this map do?

Answers to your questions

Full example

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

现在怎么做到这一点`地图`吗？

So now what does this `map` do?

登录关闭