如何使用Spark SQL列出数据库中的所有表? [英] How to list all tables in database using Spark SQL?

查看:2236
本文介绍了如何使用Spark SQL列出数据库中的所有表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我与外部数据库建立了SparkSQL连接:

I have a SparkSQL connection to an external database:

from pyspark.sql import SparkSession

spark = SparkSession \
  .builder \
  .appName("Python Spark SQL basic example") \
  .getOrCreate()

如果我知道表的名称,那么查询就很容易.

If I know the name of a table, it's easy to query.

users_df = spark \
  .read.format("jdbc") \
  .options(dbtable="users", **db_config) \
  .load()

但是有一种列出/发现表的好方法吗?

But is there a good way to list/discover tables?

我想要在MySQL中等于SHOW TABLES,在postgres中等于\dt.

I want the equivalent of SHOW TABLES in mysql, or \dt in postgres.

我正在使用pyspark v2.1,以防万一.

I'm using pyspark v2.1, in case that makes any difference.

推荐答案

此问题的答案实际上并非特定火花.您只需要加载information_schema.tables.

The answer to this question isn't actually spark specific. You'll just need to load the information_schema.tables.

信息模式由一组视图组成,这些视图包含有关当前数据库中定义的对象的信息.信息模式是在SQL标准中定义的,因此可以预期是可移植的并保持稳定-与系统目录不同,后者是RDBMS特有的,并且是在考虑实现问题后进行建模的.

The information schema consists of a set of views that contain information about the objects defined in the current database. The information schema is defined in the SQL standard and can therefore be expected to be portable and remain stable — unlike the system catalogs, which are specific to RDBMS and are modelled after implementation concerns.

我将使用MySQL作为代码段,其中包含要在其中列出表的enwiki数据库:

I'll be using MySQL for my code snippet which contains a enwiki database on which I want to list tables :

# read the information schema table 
spark.read.format('jdbc'). \
     options(
         url='jdbc:mysql://localhost:3306/', # database url (local, remote)
         dbtable='information_schema.tables',
         user='root',
         password='root',
         driver='com.mysql.jdbc.Driver'). \
     load(). \
     filter("table_schema = 'enwiki'"). \ # filter on specific database.
     show()
# +-------------+------------+----------+----------+------+-------+----------+----------+--------------+-----------+---------------+------------+----------+--------------+--------------------+-----------+----------+---------------+--------+--------------+-------------+
# |TABLE_CATALOG|TABLE_SCHEMA|TABLE_NAME|TABLE_TYPE|ENGINE|VERSION|ROW_FORMAT|TABLE_ROWS|AVG_ROW_LENGTH|DATA_LENGTH|MAX_DATA_LENGTH|INDEX_LENGTH| DATA_FREE|AUTO_INCREMENT|         CREATE_TIME|UPDATE_TIME|CHECK_TIME|TABLE_COLLATION|CHECKSUM|CREATE_OPTIONS|TABLE_COMMENT|
# +-------------+------------+----------+----------+------+-------+----------+----------+--------------+-----------+---------------+------------+----------+--------------+--------------------+-----------+----------+---------------+--------+--------------+-------------+
# |          def|      enwiki|      page|BASE TABLE|InnoDB|     10|   Compact|   7155190|           115|  828375040|              0|   975601664|1965031424|      11359093|2017-01-23 08:42:...|       null|      null|         binary|    null|              |             |
# +-------------+------------+----------+----------+------+-------+----------+----------+--------------+-----------+---------------+------------+----------+--------------+--------------------+-----------+----------+---------------+--------+--------------+-------------+

注意::该解决方案可以在受语言限制的情况下应用于scala和java.

Note: This solution can be applied to the scala and java with respectful languages constraints.

这篇关于如何使用Spark SQL列出数据库中的所有表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆