在星火问题处理Python对象名单 [英] Problems on Spark dealing with list of Python object

查看:141
本文介绍了在星火问题处理Python对象名单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我学习星火,我只是有一个问题,当我用星火处理Python对象的列表。以下是我的code:

I am learning Spark, and I just got a problem when I used Spark to deal with a list of Python object. The following is my code:

import numpy as np    
from pyspark import SparkConf, SparkContext

### Definition of Class A
class A:
    def __init__(self, n):
        self.num = n

### Function "display"
def display(s):
    print s.num
    return s

def main():
    ### Initialize the Spark
    conf = SparkConf().setAppName("ruofan").setMaster("local")
    sc = SparkContext(conf = conf)

    ### Create a list of instances of Class A
    data = []
    for i in np.arange(5):
        x = A(i)
        data.append(x)

    ### Use Spark to parallelize the list of instances
    lines = sc.parallelize(data)

    ### Spark mapping
    lineLengths1 = lines.map(display)

if __name__ == "__main__":
    main()

当我运行我的code,似乎不打印每个实例的数量(但应印有0,1,2,3,4)。我试图找到原因,但我对此没有意见。我真的AP preciate如果有人帮我。

When I run my code, it seemed not printing the number of each instance (But it should have printed 0, 1, 2, 3, 4). I try to find the reasons, but I have no ideas on this. I would really appreciate if anyone help me.

推荐答案

所有显示的第一个永远不会执行。 RDDS懒洋洋地所以只要你不执行行动(如收集计数 saveAsTextFile )真的没什么发生的情况。

First of all display is never executed. RDDs are lazily evaluated so as long you don't perform an action (like collect, count or saveAsTextFile) nothing really happens.

问题的另一部分需要Spark架构的理解。简化的东西一点点驱动程序负责 SparkContext 创建和发送任务,工作节点。转换过程中所发生的一切(你的情况地图)是工人,以便打印语句的输出连接到标准输出工人执行。如果你想获得某种形式的输出,你应该考虑使用日志来代替。

Another part of the problem requires an understanding of Spark architecture. Simplifying things a little bit Driver program is responsible for SparkContext creation and sending tasks to the Worker Nodes. Everything that happens during transformations (in your case map) is executed on the Workers so the output of the print statement goes to the Worker stdout. If you want to obtain some kind of output you should consider using logs instead.

最后,如果你的目标是获得某种副作用这将是惯用的方法是使用的foreach 而不是地图。

Finally if your goal is to get some kind of side effect it would be idiomatic to use foreach instead of map.

这篇关于在星火问题处理Python对象名单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆