如何使用Spark SQL获取多个表 [英] how to fetch multiple tables using spark sql

查看:142
本文介绍了如何使用Spark SQL获取多个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用pyspark从mysql获取数据,仅用于一个表.我想从mysql db获取所有表.不想一次又一次地调用jdbc连接.参见下面的代码

I am fetching data from mysql using pyspark which for only one table.I want to fetch all tables from mysql db. Don't want call jdbc connection again and again. see code below

是否可以简化我的代码?预先谢谢你

Is it possible to simplify my code? Thank you in advance

url = "jdbc:mysql://localhost:3306/dbname"
table_df=sqlContext.read.format("jdbc").option("url",url).option("dbtable","table_name").option("user","root").option("password", "root").load()
sqlContext.registerDataFrameAsTable(table_df, "table1")

table_df_1=sqlContext.read.format("jdbc").option("url",url).option("dbtable","table_name_1").option("user","root").option("password", "root").load()
sqlContext.registerDataFrameAsTable(table_df_1, "table2")

推荐答案

您需要以某种方式获取mysql中具有的表的列表.您可以找到一些sql命令来执行此操作,或者手动创建一个包含所有内容的文件.

you need somehow to acquire the list of the tables you have in mysql. Either you find some sql commands to do that, or you manually create a file containing everything.

然后,假设您可以在python tablename_list 中创建表名列表,则可以像这样简单地循环遍历它:

Then, assuming you can create a list of tablenames in python tablename_list, you can simply loop over it like this :

url = "jdbc:mysql://localhost:3306/dbname"
reader = (
    sqlContext.read.format("jdbc")
    .option("url", url)
    .option("user", "root")
    .option("password", "root")
)
for tablename in tablename_list:
    reader.option("dbtable", tablename).load().createTempView(tablename)

这将创建一个具有相同表名的临时视图.如果需要其他名称,则可以使用元组列表(表名_in_mysql,表名_inspark)来更改初始的 tablename_list .

This will create a temporary view with the same tablename. If you want another name, you can probably change the initial tablename_list with a list of tuples (tablename_in_mysql, tablename_in_spark).

这篇关于如何使用Spark SQL获取多个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆