使用SQL星火从SQL Server读取数据 [英] Reading data from SQL Server using Spark SQL

查看:179
本文介绍了使用SQL星火从SQL Server读取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能读取从Microsoft SQL Server(和Oracle和MySQL等)的数据到一个Spark应用程序中的RDD?还是我们需要在内存中集来创建和parallize的成RDD?

Is it possible to read data from Microsoft Sql Server (and oracle, mysql, etc.) into an rdd in a Spark application? Or do we need to create an in memory set and parallize that into an RDD?

推荐答案

从发现的邮件列表,这是一个解决方案。 JdbcRDD可用于实现此目的。我需要得到的MS SQL Server的JDBC驱动jar并将其添加到lib为我的项目。我想用集成安全性,所以需要把sqljdbc_auth.dll(可在同一下载)在的java.library.path可以看到的位置。然后,code是这样的:

Found a solution to this from the mailing list. JdbcRDD can be used to accomplish this. I needed to get the MS Sql Server JDBC driver jar and add it to the lib for my project. I wanted to use integrated security, and so needed to put sqljdbc_auth.dll (available in the same download) in a location that java.library.path can see. Then, the code looks like this:

     val rdd = new JdbcRDD[Email](sc,
          () => {DriverManager.getConnection(
 "jdbc:sqlserver://omnimirror;databaseName=moneycorp;integratedSecurity=true;")},
          "SELECT * FROM TABLE_NAME Where ? < X and X < ?",
            1, 100000, 1000,
          (r:ResultSet) => { SomeClass(r.getString("Col1"), 
            r.getString("Col2"), r.getString("Col3")) } )

此给出所需SomeClass.The第二,第三和第四参数的RDD并且是下限和上限,并​​且分区数。换句话说,该源数据需要由多头是分区为此工作

This gives an Rdd of SomeClass.The second, third and fourth parameters are required and are for lower and upper bounds, and number of partitions. In other words, that source data needs to be partitionable by longs for this to work.

这篇关于使用SQL星火从SQL Server读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆