如何从SQL查询中提取表名和列名? [英] How to extract table names and column names from sql query?
问题描述
所以假设我们有这样简单的查询:
So let assume we have such simple query:
Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;
结果应如下所示:
tb1 col1
tb1 col7
tb2 col2
tb2 col8
我尝试使用一些python库解决此问题:
I've tried to solve this problem using some python library:
1)即使仅使用sqlparse
提取表也可能是一个巨大的问题.例如,这官方书籍根本无法正常工作
1) Even extracting only tables using sqlparse
might be a huge problem. For example this official book doesn't work properly at all.
2)使用正则表达式似乎很难实现.
2) Using regular expression seems to be really hard to achieve.
3)但后来我发现这,可能会有所帮助.但是问题是我无法连接到任何数据库并执行该查询.
3) But then I found this , that might help. However the problem is that I can't connect to any database and execute that query.
有什么想法吗?
推荐答案
真的,这不是一件容易的事.您可以使用词法分析器(在此示例中为 ply )并定义一些规则以从细绳.以下代码为SQL字符串的不同部分定义了这些规则,并将它们放回一起,因为输入字符串中可能会有别名.如此一来,您将获得一个字典(result
),它具有不同的表名作为键.
Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result
) with the different tablenames as key.
import ply.lex as lex, re
tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)
tables = {"tables": {}, "alias": {}}
columns = []
t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"
def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"
regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"
regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_COLUMN(t):
r"(\w+\.\w+)"
regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))
# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)
for token in lexer:
pass
result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl
if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)
print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}
string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)
这篇关于如何从SQL查询中提取表名和列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!