例外:无法在pyspark打开插座 [英] Exception: could not open socket on pyspark
本文介绍了例外:无法在pyspark打开插座的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
每当我试图在pyspark执行一个简单的处理后,无法打开插座。
>>> myRDD = sc.parallelize(范围(6),3)
>>> sc.runJob(myRDD,拉姆达部分:X * X在部分X])
以上抛出异常 -
端口53554,原6,SA(127.0.0.1,53554)
回溯(最近通话最后一个):
文件<&标准输入GT;,1号线,上述<&模块GT;
文件/Volumes/work/bigdata/spark-custom/python/pyspark/context.py,线路917,在runJob
返回列表(_load_from_socket(端口,mappedRDD._jrdd_deserializer))
文件/Volumes/work/bigdata/spark-custom/python/pyspark/rdd.py,143线,在_load_from_socket
提高异常(无法打开插座)
例外:无法打开插座>>> 15/08/30 19点03分05秒ERROR PythonRDD:错误,同时发送迭代器
java.net.SocketTimeoutException:接受超时
在java.net.PlainSocketImpl.socketAccept(本机方法)
在java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404)
在java.net.ServerSocket.implAccept(ServerSocket.java:545)
在java.net.ServerSocket.accept(ServerSocket.java:513)
在org.apache.spark.api.python.PythonRDD $$不久$ 2.run(PythonRDD.scala:613)
我通过rdd.py _load_from_socket检查,并意识到它得到的端口,但服务器没有开始或SP runJob可能是问题 -
端口= self._jvm.PythonRDD.runJob(self._jsc.sc(),mappedRDD._jrdd,分区)
解决方案
它不是理想的解决方案,但现在我知道了原因。
Pyspark无法创建JVM插座JDK 1.8(64位)版本,所以我只是把我的java的路径为JDK 1.7和它的工作。
Whenever I am trying to execute a simple processing in pyspark, it fails to open the socket.
>>> myRDD = sc.parallelize(range(6), 3)
>>> sc.runJob(myRDD, lambda part: [x * x for x in part])
Above throws exception -
port 53554 , proto 6 , sa ('127.0.0.1', 53554)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Volumes/work/bigdata/spark-custom/python/pyspark/context.py", line 917, in runJob
return list(_load_from_socket(port, mappedRDD._jrdd_deserializer))
File "/Volumes/work/bigdata/spark-custom/python/pyspark/rdd.py", line 143, in _load_from_socket
raise Exception("could not open socket")
Exception: could not open socket
>>> 15/08/30 19:03:05 ERROR PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:613)
I checked through rdd.py _load_from_socket and realised it gets the port , but the server is not even started or sp runJob might be the issue-
port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
解决方案
Its not the ideal solution but now I am aware of the cause. Pyspark is unable to create jvm socket with JDK 1.8 (64-bit) version, so I just set my java path to jdk 1.7 and it worked.
这篇关于例外:无法在pyspark打开插座的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文