尝试从Spark连接到Oracle [英] Trying to connect to Oracle from Spark
问题描述
我正在尝试将Oracle连接到Spark,并希望从某些表和SQL查询中提取数据.但是我无法连接到Oracle.我已经尝试过围绕选项的其他解决方法,但是没有外观.我已按照以下步骤操作.如果我需要进行任何更改,请纠正我.
I am trying to connect to Oracle to Spark and want pull data from some table and SQL queries. But I am not able to connect to Oracle. I have tried different work around options, but no look. I have followed the below steps. Please correct me if I need to make any changes.
我正在使用Windows 7计算机.我使用Jupyter笔记本电脑来使用Pyspark.我有python 2.7和Spark 2.1.0.我在环境变量中设置了spark类路径:
I am using Windows 7 machine. I using Jupyter notebook to use Pyspark. I have python 2.7 and Spark 2.1.0. I have set a spark Class path in environment variables:
SPARK_CLASS_PATH = C:\Oracle\Product\11.2.0\client_1\jdbc\lib\ojdbc6.jar
jdbcDF = sqlContext.read.format("jdbc").option("driver","oracle.jdbc.driver.OracleDriver").option("url","jdbc:oracle://dbserver:port#/database).option(" dbtable," Table_name).option(" user," username).option(" password," password).load()
jdbcDF = sqlContext.read.format("jdbc").option("driver", "oracle.jdbc.driver.OracleDriver").option("url", "jdbc:oracle://dbserver:port#/database").option("dbtable","Table_name").option("user","username").option("password","password").load()
错误:
1.Py4JJavaError:
1.Py4JJavaError:
An error occurred while calling o148.load.
: java.sql.SQLException: Invalid Oracle URL specified
2.Py4JJavaError:
2.Py4JJavaError:
An error occurred while calling o114.load. : java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
另一种情况:
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
ORACLE_DRIVER_PATH = "C:\Oracle\Product\11.2.0\client_1\jdbc\lib\ojdbc7.jar"
Oracle_CONNECTION_URL ="jdbc:oracle:thin:username/password@servername:port#/dbservicename"
conf = SparkConf()
conf.setMaster("local")
conf.setAppName("Oracle_imp_exp")
sqlContext = SQLContext(sc)
ora_tmp=sqlContext.read.format('jdbc').options(
url=Oracle_CONNECTION_URL,
dbtable="tablename",
driver="oracle.jdbc.OracleDriver"
).load()
我遇到了错误.
Error: IllegalArgumentException: u"Error while instantiating org.apache.spark.sql.hive.HiveSessionState':"
请帮助我.
推荐答案
此更改已解决.
sqlContext = SQLContext(sc)
ora_tmp=spark.read.format('jdbc').options(
url=Oracle_CONNECTION_URL,
dbtable="tablename",
driver="oracle.jdbc.OracleDriver"
).load()
这篇关于尝试从Spark连接到Oracle的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!