Spark如何与CPython互操作 [英] How does Spark interoperate with CPython

查看：110 发布时间：2020/5/24 1:51:36 scala pandas apache-spark interop pyspark

本文介绍了Spark如何与CPython互操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个用scala编写的Akka系统，需要依靠Pandas和Numpy调用一些Python代码，所以我不能只使用Jython.我注意到Spark在其工作程序节点上使用CPython，所以很好奇它如何执行Python代码以及该代码是否以某种可重用的形式存在.

I have an Akka system written in scala that needs to call out to some Python code, relying on Pandas and Numpy, so I can't just use Jython. I noticed that Spark uses CPython on its worker nodes, so I'm curious how it executes Python code and whether that code exists in some re-usable form.

推荐答案

此处介绍了PySpark体系结构

PySpark architecture is described here https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals.

正如@Holden所说，Spark使用py4j从python访问JVM中的Java对象.但这只是一种情况-当驱动程序是用python编写的(该图的左侧)

As @Holden said Spark uses py4j to access Java objects in JVM from the python. But this is only one case - when driver program is written in python (left part of diagram there)

另一种情况(该图的右侧)-当Spark Worker启动Python进程并将序列化的Java对象发送到要处理的python程序，并接收输出时. Java对象被序列化为pickle格式-以便python可以读取它们.

The other case (the right part of the diagram) - when Spark Worker starts Python process and sends serialized Java objects to python program to be processed, and receives output. Java objects are serialized into pickle format - so python could read them.

看起来您正在寻找的是后一种情况.这里有一些Spark的scala核心链接，这些链接可能对您入门有用:

Looks like what you are looking for is the latter case. Here some links to the Spark's scala core that could be useful for you to get started:

Pyrolite 库，该库为Python的pickle协议提供Java接口-Spark用于序列化将Java对象转换为pickle格式.例如，对于访问PairRDD的Key和Value对的Key部分，需要进行这种转换.

Pyrolite library that provides Java interface to Python's pickle protocols - used by Spark to serialize Java objects into pickle format. For example such conversion is required for accessing Key part of Key, Value pairs for the PairRDD.

启动python进程并对其进行迭代的scala代码:

Scala code that starts python process and iterates with it: api/python/PythonRDD.scala

SerDeser实用程序，用于选择代码:

SerDeser utils that do picking of the code: api/python/SerDeUtil.scala

Python端: python/pyspark/worker. py

这篇关于Spark如何与CPython互操作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark如何与CPython互操作 [英] How does Spark interoperate with CPython

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Spark如何与CPython互操作 [英] How does Spark interoperate with CPython

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭