在星火问题处理Python对象名单 [英] Problems on Spark dealing with list of Python object
问题描述
我学习星火,我只是有一个问题,当我用星火处理Python对象的列表。以下是我的code:
I am learning Spark, and I just got a problem when I used Spark to deal with a list of Python object. The following is my code:
import numpy as np
from pyspark import SparkConf, SparkContext
### Definition of Class A
class A:
def __init__(self, n):
self.num = n
### Function "display"
def display(s):
print s.num
return s
def main():
### Initialize the Spark
conf = SparkConf().setAppName("ruofan").setMaster("local")
sc = SparkContext(conf = conf)
### Create a list of instances of Class A
data = []
for i in np.arange(5):
x = A(i)
data.append(x)
### Use Spark to parallelize the list of instances
lines = sc.parallelize(data)
### Spark mapping
lineLengths1 = lines.map(display)
if __name__ == "__main__":
main()
当我运行我的code,似乎不打印每个实例的数量(但应印有0,1,2,3,4)。我试图找到原因,但我对此没有意见。我真的AP preciate如果有人帮我。
When I run my code, it seemed not printing the number of each instance (But it should have printed 0, 1, 2, 3, 4). I try to find the reasons, but I have no ideas on this. I would really appreciate if anyone help me.
推荐答案
所有显示的第一个永远不会执行
。 RDDS懒洋洋地所以只要你不执行行动(如收集
,计数
或 saveAsTextFile
)真的没什么发生的情况。
First of all display
is never executed. RDDs are lazily evaluated so as long you don't perform an action (like collect
, count
or saveAsTextFile
) nothing really happens.
问题的另一部分需要Spark架构的理解。简化的东西一点点驱动程序负责 SparkContext
创建和发送任务,工作节点。转换过程中所发生的一切(你的情况地图
)是工人,以便打印语句的输出连接到标准输出工人执行。如果你想获得某种形式的输出,你应该考虑使用日志来代替。
Another part of the problem requires an understanding of Spark architecture. Simplifying things a little bit Driver program is responsible for SparkContext
creation and sending tasks to the Worker Nodes. Everything that happens during transformations (in your case map
) is executed on the Workers so the output of the print statement goes to the Worker stdout. If you want to obtain some kind of output you should consider using logs instead.
最后,如果你的目标是获得某种副作用这将是惯用的方法是使用的foreach
而不是地图。
Finally if your goal is to get some kind of side effect it would be idiomatic to use foreach
instead of map.
这篇关于在星火问题处理Python对象名单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!