使用pyparsing进行SQL解析 [英] SQL parsing using pyparsing

查看:383
本文介绍了使用pyparsing进行SQL解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近几周我正在学习PyParsing.我计划使用它从SQL语句获取表名. 我看过 http://pyparsing.wikispaces.com/file/view/simpleSQL.py .但是我打算使语法保持简单,因为我不是在试图解析select语句的每个部分,而是在寻找表名.同样,为任何市售的现代数据库(如Teradata)定义完整的语法也很费劲.

I am learning PyParsing in last few weeks. I plan to use it to get table names from SQL statements. I have looked at http://pyparsing.wikispaces.com/file/view/simpleSQL.py. But I intend to keep the grammar simple because I am not trying to get every part of select statement parsed rather I am looking for just table names. Also it is quite involved to define the complete grammar for any commercially available modern day database like Teradata.

#!/usr/bin/env python

from pyparsing import *
import sys

semicolon = Combine(Literal(';') + lineEnd)
comma = Literal(',')
lparen = Literal('(')
rparen = Literal(')')

# Keyword definition
update_kw, volatile_kw, create_kw, table_kw, as_kw, from_kw, \
where_kw, join_kw, left_kw, right_kw, cross_kw, outer_kw, \
on_kw , insert_kw , into_kw= \
    map(lambda x: Keyword(x, caseless=True), \
        ['UPDATE', 'VOLATILE', 'CREATE', 'TABLE', 'AS', 'FROM',
         'WHERE', 'JOIN' , 'LEFT', 'RIGHT' , \
         'CROSS', 'OUTER', 'ON', 'INSERT', 'INTO'])

# Teradata SQL allows SELECT and well as SEL keyword
select_kw = Keyword('SELECT', caseless=True) | Keyword('SEL' , caseless=True)

# list of reserved keywords
reserved_words = (update_kw | volatile_kw | create_kw | table_kw | as_kw |
                  select_kw | from_kw | where_kw | join_kw |
                  left_kw | right_kw | cross_kw | on_kw | insert_kw |
                  into_kw)

# Identifier can be used as table or column names. They can't be reserved words
ident = ~reserved_words + Word(alphas, alphanums + '_')

# Recursive definition for table
table = Forward()
# simple table name can be identifer or qualified identifier e.g. schema.table
simple_table = Combine(Optional(ident + Literal('.')) + ident)
# table name can also a complete select statement used as table
nested_table = lparen.suppress() + select_kw.suppress() + SkipTo(from_kw).suppress() + \   
               from_kw.suppress() + table + rparen.suppress()
# table can be simple table or nested table
table << (nested_table | simple_table)
# comma delimited list of tables
table_list = delimitedList(table)
# Building from clause only because table name(s) will always appears after that
from_clause = from_kw.suppress() + table_list


txt = """
SELECT p, (SELECT * FROM foo),e FROM a, d, (SELECT * FROM z), b
"""
for token, start, end in from_clause.scanString(txt):
    print token

这里值得一提.我使用"SkipTo(from_kw)"跳过SQL语句中的列列表.这主要是为了避免为列列表定义语法,列列表可以是逗号分隔的标识符,许多函数名称,DW分析函数等的列表.有了这种语法,我就可以解析上面的语句以及SELECT列列表或表列表中任何层次的嵌套.

A thing worth mentioning here. I use "SkipTo(from_kw)" to jump over column list in SQL statement. This is primarily to avoid defining grammar for column list which can be comma delimited list of identifiers, many function names, DW analytical functions and what not. With this grammar I am able to parse above statement as well as any level of nesting in SELECT column list or table list.

['foo']
['a', 'd', 'z', 'b']

当SELECT具有where子句时,我面临问题:

I am facing problem when SELECT has where clause:

nested_table = lparen.suppress() + select_kw.suppress() + SkipTo(from_kw).suppress() + \   
               from_kw.suppress() + table + rparen.suppress()

当存在WHERE子句时,相同的语句可能类似于: SELECT ... FROM a,d,(选择* FROM z WHERE(c1 = 1)和(c2 = 3)),p 我想到将"nested_table"定义更改为:

When WHERE clause is there then the same statement may look like: SELECT ... FROM a,d , (SELECT * FROM z WHERE (c1 = 1) and (c2 = 3)), p I thought of changing "nested_table" definition to:

nested_table = lparen.suppress() + select_kw.suppress() + SkipTo(from_kw).suppress() + \   
               from_kw.suppress() + table + Optional(where_kw + SkipTo(rparen)) + rparen

但是这不起作用,因为它与"c = 1"之后的右括号匹配.我想知道的是如何在"SELECT * FROM z ..."之前跳到与左括号匹配的右括号,我不知道如何使用PyParsing

But this is not working since it matches to the right parenthesis following "c = 1". What I would like to know is how to skip to the right parenthesis that matches left parenthesis right before "SELECT * FROM z..." I don't know how to do it using PyParsing

另外,我还寻求一些建议,这是从复杂的嵌套SQL获取表名的最佳方法.真的很感谢您的帮助.

Also on a different note I seek some advice the best way to get table names from complex nested SQLs. Any help is really appreciated.

谢谢 阿比吉特

推荐答案

考虑到您也试图解析嵌套的SELECT,我认为您将无法避免编写一个相当完整的SQL解析器.幸运的是,Pyparsing Wiki示例页面上有一个更完整的示例, select_parser .py .希望您能走得更远.

Considering that you are also trying to parse out nested SELECT's, I don't think you'll be able to avoid writing a fairly complete SQL parser. Fortunately, there is a more complete example on the Pyparsing wiki Examples page, select_parser.py. I hope that gets you further along.

这篇关于使用pyparsing进行SQL解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆