如何在Kubernetes-Spark中加载jar包(例如JDBC) [英] How to load jar package such as JDBC in Kubernetes-Spark

查看:80
本文介绍了如何在Kubernetes-Spark中加载jar包(例如JDBC)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在按照 Kubernetes的Spark示例.我可以启动PySpark shell.但是,我需要将PySpark与JDBC一起使用以连接到我的Postgres数据库.在尝试Kubernetes之前,我已经使用spark-defaults.conf文件使JDBC与Spark一起使用:

I am following the instructions laid out on Kubernetes' Spark example. I can get to the step with launching the PySpark shell. However, I need to use PySpark with JDBC to connect to my Postgres database. Before I tried Kubernetes, I got the JDBC working with Spark using the spark-defaults.conf file:

spark.driver.extraClassPath /spark/postgresql-9.4.1209.jre7.jar
spark.executor.extraClassPath /spark/postgresql-9.4.1209.jre7.jar

我还必须先将驱动程序下载到该位置.如何使用Kubernetes实现同一目标?我认为我做不到

I also had to download the driver into the location first. How do I achieve the same thing with Kubernetes? I don't think I can do

kubectl exec zeppelin-controller-xzlrf -it pyspark --jars /spark/postgresql-9.4.1209.jre7.jar

因为罐子必须先放在容器中.因此,如果我可以将jar文件放入容器中,也许可以使它正常工作,但是我该怎么做呢?任何想法或帮助都将不胜感激.

because the jar would have to be inside the container first. Therefore, maybe I can get it working if I can get the jar file inside the container, but how do I do that? Any thoughts or help is greatly appreciated.

更新:我尝试遵循@LostInOverflow的解决方案,但遇到以下情况:

UPDATE: I tried following @LostInOverflow's solution but encountered the following:

kubectl exec zeppelin-controller-2p3ew -it -- pyspark --packages org.postgresql:postgresql:9.4.1209.jre7.jar

似乎可以启动并识别包参数,但仍然失败:

which appears to boot up and recognizes the package argument but still fails:

Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.postgresql#postgresql added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
:: resolution report :: resolve 2294ms :: artifacts dl 0ms
    :: modules in use:
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: org.postgresql#postgresql;9.4.1209.jre7.jar

    ==== local-m2-cache: tried

      file:/root/.m2/repository/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      file:/root/.m2/repository/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

    ==== local-ivy-cache: tried

      /root/.ivy2/local/org.postgresql/postgresql/9.4.1209.jre7.jar/ivys/ivy.xml

    ==== central: tried

      https://repo1.maven.org/maven2/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      https://repo1.maven.org/maven2/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      http://dl.bintray.com/spark-packages/maven/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: org.postgresql#postgresql;9.4.1209.jre7.jar: not found

        ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.postgresql#postgresql;9.4.1209.jre7.jar: not found]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
  File "/opt/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/opt/spark/python/pyspark/context.py", line 110, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "/opt/spark/python/pyspark/context.py", line 234, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>> 

推荐答案

您可以将--packages与坐标代替--jars一起使用:

You can use --packages with coordinates in place of --jars:

--packages org.postgresql:postgresql:9.4.1209.jre7.jar

这篇关于如何在Kubernetes-Spark中加载jar包(例如JDBC)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆