转换顺序code转换成并行 [英] Converting sequential code into parallel

查看：156 发布时间：2016/5/22 16:29:15 java python mapreduce apache-spark

本文介绍了转换顺序code转换成并行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想了解马preduce星火。结果
如果做一些简单的练习，我没有问题，这样做的顺序方式，但是当涉及到parallelise我的code，我遇到困难。

I'm trying to understand MapReduce with Spark.
When doing some simple exercises, I have no problems, doing it the sequential way, but when it comes to parallelise my code, I experience difficulties.

请看下面的例子：

var = "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts."
k = 10;

for x in range(0, len(var)):
    print(var[0+x:k+x])

这将一个文本10个字符的带状疱疹（ W-叠瓦）。< BR>
什么是使用星火转换它变成并行code的正确方法？

It splits a text into shingles of 10 characters (w-shingling).
What would be the right way to "convert" it into parallel code using Spark?

如何code一循环？星火是否提供循环呢？

How to code a for loop? Does Spark offer loops anyway?

我只需要了解整个概念。

I just need to understand the whole concept.

PS：我已经阅读文档，我知道什么是RDDS等，我只是不知道如何转化连续code成平行

PS: I've already read the documentation, I know what RDDs are etc. I just don't know, how to "convert" sequential code into parallel.

推荐答案

正如你可能已经看过，星火具有丰富功能的API。该RDD操作分为转换和动作，其中转换可以被看作是该采取RDD和产生RDD功能： F（RDD）=＆GT; RDD 和行动作为采取RDD并产生一些结果的函数（数组[T] 在的情况下收集或在以下情况下单位 的foreach ）。

As you may have read already, Spark has a rich functional API. The RDD operations are classified into transformations and actions, where transformations could be seen as functions that take an RDD and produce an RDD: f(RDD) => RDD and actions as functions that take an RDD and produce some result (Array[T] in case of collect or Unit in case of foreach).

如何端口一定的算法，以星火的总体思路是要找到一种方法来恩preSS说，利用星火支持的功能模式算法，结合转换和行动，以实现所期望的结果。

The general idea on how to port a certain algorithm to Spark is to find a way to express said algorithm using the functional paradigm supported by Spark, by combining transformations and actions to achieve the desired outcome.

这取决于序列，如W-叠瓦以上，present挑战算法进行并行的还有中元素的顺序一个隐含的相依性，有时很难EX preSS是这样一种方式可能在不同的分区操作。

Algorithms that depend on a sequence, like the w-shingling above, present a challenge to be parallelized as there's an implicit dependency in the order of the elements that's sometimes difficult to express is such a way that could be operated in different partitions.

在这种情况下，我使用索引的方式来preserve序列，而在转型方面作出可能EX preSS算法：

In this case, I used indexing as a way to preserve the sequence while making possible to express the algorithm in terms of transformations:

def kShingle(rdd:RDD[Char], n:Int): RDD[Seq[Char]] = {
    def loop(base: RDD[(Long, Seq[Char])], cumm: RDD[(Long, Seq[Char])], i: Int): RDD[Seq[Char]] = {
       if (i<=1) cumm.map(_._2) else {
        val segment =  base.map{case (idx, seq) => (idx-1, seq)}
        loop(segment, cumm.join(segment).map{case (k,(v1,v2)) => (k,v1 ++ v2)}, i-1)
        }
    }
    val seqRdd = rdd.map(char => Seq(char))
    val indexed = seqRdd.zipWithIndex.map(_.swap)

    loop(indexed, indexed, n)
}

星火壳示例：

val rdd = sc.parallelize("Floppy Disk")
scala> kShingle(rdd,3).collect
res23: Array[Seq[Char]] = Array(List(F, l, o), List(i, s, k), List(l, o, p), List(o, p, p), List(p, p, y), List(p, y,  ), List(y,  , D), List( , D, i), List(D, i, s))

这篇关于转换顺序code转换成并行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

转换顺序code转换成并行 [英] Converting sequential code into parallel

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

转换顺序code转换成并行 [英] Converting sequential code into parallel

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭