例外:无法在pyspark打开插座 [英] Exception: could not open socket on pyspark

查看:3438
本文介绍了例外:无法在pyspark打开插座的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每当我试图在pyspark执行一个简单的处理后,无法打开插座。

 >>> myRDD = sc.parallelize(范围(6),3)
>>> sc.runJob(myRDD,拉姆达部分:X * X在部分X])

以上抛出异常 -

 端口53554,原6,SA(127.0.0.1,53554)
回溯(最近通话最后一个):
  文件<&标准输入GT;,1号线,上述<&模块GT;
  文件/Volumes/work/bigdata/spark-custom/python/pyspark/context.py,线路917,在runJob
    返回列表(_load_from_socket(端口,mappedRDD._jrdd_deserializer))
  文件/Volumes/work/bigdata/spark-custom/python/pyspark/rdd.py,143线,在_load_from_socket
    提高异常(无法打开插座)
例外:无法打开插座>>> 15/08/30 19点03分05秒ERROR PythonRDD:错误,同时发送迭代器
java.net.SocketTimeoutException:接受超时
    在java.net.PlainSocketImpl.socketAccept(本机方法)
    在java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404)
    在java.net.ServerSocket.implAccept(ServerSocket.java:545)
    在java.net.ServerSocket.accept(ServerSocket.java:513)
    在org.apache.spark.api.python.PythonRDD $$不久$ 2.run(PythonRDD.scala:613)

我通过rdd.py _load_from_socket检查,并意识到它得到的端口,但服务器没有开始或SP runJob可能是问题 -

 端口= self._jvm.PythonRDD.runJob(self._jsc.sc(),mappedRDD._jrdd,分区)


解决方案

它不是理想的解决方案,但现在我知道了原因。
Pyspark无法创建JVM插座JDK 1.8(64位)版本,所以我只是把我的java的路径为JDK 1.7和它的工作。

Whenever I am trying to execute a simple processing in pyspark, it fails to open the socket.

>>> myRDD = sc.parallelize(range(6), 3)
>>> sc.runJob(myRDD, lambda part: [x * x for x in part])

Above throws exception -

port 53554 , proto 6 , sa ('127.0.0.1', 53554)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/work/bigdata/spark-custom/python/pyspark/context.py", line 917, in runJob
    return list(_load_from_socket(port, mappedRDD._jrdd_deserializer))
  File "/Volumes/work/bigdata/spark-custom/python/pyspark/rdd.py", line 143, in _load_from_socket
    raise Exception("could not open socket")
Exception: could not open socket

>>> 15/08/30 19:03:05 ERROR PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
    at java.net.ServerSocket.accept(ServerSocket.java:513)
    at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:613)

I checked through rdd.py _load_from_socket and realised it gets the port , but the server is not even started or sp runJob might be the issue-

port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)

解决方案

Its not the ideal solution but now I am aware of the cause. Pyspark is unable to create jvm socket with JDK 1.8 (64-bit) version, so I just set my java path to jdk 1.7 and it worked.

这篇关于例外:无法在pyspark打开插座的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆