使用Spark从Oracle导入数据 [英] Import Data from Oracle using Spark

查看:66
本文介绍了使用Spark从Oracle导入数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Databricks中,我正在使用以下代码从Oracle中提取数据.

In Databricks I am using the following code to extract data from Oracle.

%scala
val empDF = spark.read 
    .format("jdbc") 
    .option("url", "jdbc:oracle:thin:username/password//hostname:port/sid") 
    .option("dbtable", "EMP") 
    .option("user", "username") 
    .option("password", "password") 
    .option("driver", "oracle.jdbc.driver.OracleDriver") 
    .load()

我遇到以下错误:

java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection

ojdbc6.jar 作为库附加到群集.

我需要连接到Oracle以读取表数据.该表还具有BLOB数据.

I need to connect to Oracle to read the table data. The table also has BLOB data.

推荐答案

首先,您应该通过以下方式再次检查您的Apache Spark™集群对您的Oracle数据库具有网络访问权限:

Firstly, you should double check that your Apache Spark™ cluster has network access to your Oracle Database by:

%sh
telnet <host> <port>

我假设您的Oracle实例也在您的云帐户中运行.您可能需要进行VPC对等(如果在AWS上),以允许Databricks的群集与另一个VPC中的数据库实例之间的连接以进行私有访问.如果没有隐私问题,您可以通过安全组设置向外界开放.

I assume that your Oracle instance is also running in your cloud account. You may need to do VPC peering (if on AWS) to allow for a connection between Databricks' clusters and the database instance in another VPC for private access. If there is no privacy concern, you can open up to the world through the security group settings.

第二,您的jdbc URL可能不正确.请查看此示例oracle连接,以及此 jdbc连接指南.

Secondly, your jdbc URL may not be correct. Please review this sample oracle connection, as well as this jdbc connection guide.

这篇关于使用Spark从Oracle导入数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆