是否有任何经验法则可以根据人类可读的描述构造 SQL 查询? [英] Is there any rule of thumb to construct SQL query from a human-readable description?

查看:29
本文介绍了是否有任何经验法则可以根据人类可读的描述构造 SQL 查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每当我们面前有查询的任何描述时,我们都会尝试应用启发式和头脑风暴来构建查询.

是否有任何系统的分步或数学方法可以根据给定的人类可读描述构造 SQL 查询?

例如,如何确定 SQL 查询是否需要连接而不是子查询,是否需要 group by,是否需要 IN 子句等....

例如,任何学习过数字电子学的人都会知道卡诺图或昆麦克劳斯基方法等方法.这些是一些简化数字逻辑的系统方法.

有没有这样的方法可以手动分析sql查询,避免每次都头脑风暴?

解决方案

<块引用>

有没有系统的分步或数学方法来构造来自给定人类可读描述的 SQL 查询?

是的,有.

事实证明,自然语言表达式和逻辑表达式以及关系代数表达式和 SQL 表达式(后两者的混合)以一种相当直接的方式对应.(接下来是没有重复的行和没有空值.)

每个表(基本或查询结果)都有一个关联的谓词——一个由列名参数化的自然语言填充(命名)空白语句模板.

[liker] 喜欢 [liked]

一个表格包含每一行,使用该行的列值来填充(命名)空格,构成一个真实的陈述,又名命题.这是带有该谓词的表 &它的行的命题:

liker |喜欢--------------鲍勃 |Dex/* Bob 喜欢 Dex */鲍勃 |爱丽丝/* 鲍勃喜欢爱丽丝 */爱丽丝 |卡罗尔/* 爱丽丝喜欢卡罗尔 */

用表中一行的值填充谓词的每个命题都是真的.用表中not 行中的值填充谓词的每个命题都是假的.这是该表的内容:

/*爱丽丝喜欢卡罗尔而不是爱丽丝喜欢爱丽丝而不是爱丽丝喜欢鲍勃而不是 Alice 喜欢 Dex而不是爱丽丝喜欢埃德...并且鲍勃喜欢爱丽丝并且 Bob 喜欢 Dex而不是鲍勃喜欢鲍勃而不是鲍勃喜欢卡罗尔而不是鲍勃喜欢埃德...而不是卡罗尔喜欢爱丽丝...而不是 Dex 喜欢 Alice...而不是 Ed 喜欢 Alice...*/

DBA 为每个基表提供谓词.表声明的 SQL 语法很像给定谓词的自然语言版本的传统逻辑简写.这是用于保存我们的值的基表声明:

/* (person, liked) 行,其中 [liker] 喜欢 [liked] *//*(人,喜欢)行,其中喜欢(喜欢者,喜欢)*/创建表喜欢(喜欢...,喜欢...);

SQL 查询(子)表达式将参数表值转换为新表值,该值包含根据新谓词生成真语句的行.根据(子)表达式的关系/表运算符,可以根据参数表谓词来表达新的表谓词.查询是一个 SQL 表达式,其谓词是我们想要的行表的谓词.

当我们给一张桌子&(可能是隐含的)别名 A 要加入,操作符作用于一个值 &谓词类似于表,但列从 C,... 重命名为 A.C,....那么

  • R , S &R CROSS JOIN SR 的谓词和 S 的谓词

  • R INNER JOIN S ON conditionR 的谓词 AND S AND 条件的谓词

  • R LEFT JOIN S ON 条件 是行,其中(对于 S-only 列 S1,...)

     R 的谓词 AND S AND 条件的谓词或者R 的谓词AND NOT FOR S1 的某些值,... [S AND 条件的谓词]AND S1 为空 AND ...

  • R WHERE conditionR AND 条件的谓词

  • SELECT DISTINCT AC AS D,... FROM R(可能带有隐式 A. 和/或隐式 AS D) 是其中的行

    • FOR SOME values for A.*,... [AC=D AND ... AND the predicate of R](这可以不那么紧凑,但看起来更像 SQL.)
    • 如果没有删除的列,将R的谓词AC,...替换为D,...莉>
    • 如果有删除的列,FOR SOME values for 删除的列 [ R 的谓词 with AC,... 替换为 D,... ]
  • (X,...) IN (R) 表示

    • R 的谓词C,... 替换为X,...
    • (X,...) IN R

示例:(人物,喜欢的)行的自然语言,其中 [人物] 是鲍勃,鲍勃喜欢喜欢 [喜欢] 但不喜欢 Ed 的人:

/* (person, like) 行在哪里对于 x 的某些值,[人] 喜欢 [x]和 [x] 喜欢 [喜欢]和 [人] = '鲍勃'而不是 [x] 喜欢 'Ed'*/

使用速记谓词重写:

/* (person, like) 行在哪里对于 x 的某些值,喜欢(人,x)和喜欢(x,喜欢)AND 人 = '鲍勃'AND NOT Likes(x, 'Ed')*/

仅使用 base & 的速记谓词重写别名表:

/* (person, like) 行在哪里对于 l1.*、l2.* 的某些值,人 = l1.liker AND 喜欢 = l2.liked和喜欢(l1.liker,l1.liked)和喜欢(l2.liker,l2.liked)AND l1.liked = l2.likerAND 人 = '鲍勃'AND NOT (l1.liked, 'Ed') IN Likes*/

用 SQL 重写:

SELECT DISTINCT l1.liker AS 人,l2.liked AS 喜欢/* (l1.liker, l1.liked, l2.liker, l2.liked) 行在哪里喜欢(l1.liker,l1.liked)和喜欢(l2.liker,l2.liked)AND l1.liked = l2.likerAND l1.liker = '鲍勃'AND NOT (l1.liked, 'Ed') IN Likes*/来自喜欢 l1INNER JOIN 喜欢 l2ON l1.liked = l2.likerWHERE l1.liker = 'Bob'AND NOT (l1.liked, 'Ed') IN (SELECT * FROM Likes)

同样,

  • R UNION CORRESPONDING SR 的谓词或 R 的谓词

  • 的行
  • R UNION SR 的谓词或我们通过用 R 的谓词中的 R 的列替换 S 的列而得到的谓词的行

  • VALUES (X,...), ...C,...C = X AND 的行... 或 ...

示例:

/* (person) rows where对于喜欢的某些价值,喜欢(人,喜欢)或人 = '鲍勃'*/选择喜欢者为人来自喜欢联盟值('鲍勃')

因此,如果我们根据给定的基表自然语言语句模板表达我们想要的行,这些行使行为真或为假(返回或不返回),那么我们可以转换为 SQL 查询,这些查询是逻辑速记的嵌套 &运算符和/或表名 &运营商.然后 DBMS 可以完全转换为表来计算使我们的谓词为真的行.

请参阅如何从另一个 SQL 表中获取两个不同列的匹配数据:内部联接和/或联合?将此应用于 SQL.(另一个自加入.)
有关自然语言措辞的更多信息,请参阅银行场景的关系代数.(在关系代数上下文中.)
请参阅关系代数中的空,了解关系查询的另一个演示.

Whenever there is any description of query in front of us, we try to apply heuristics and brainstorming to construct the query.

Is there any systematic step-by-step or mathematical way to construct SQL query from a given human-readable description?

For instance, how to determine that, whether a SQL query would need a join rather than a subquery, whether it would require a group by, whether it would require a IN clause, etc....

For example, whoever studied Digital Electronics would be aware of the methods like Karnaugh Map or Quin McClausky method. These, are some systematic approaches to simplify digital logic.

If there any method like these to analyze sql queries manually to avoid brainstorming each time?

解决方案

Is there any systematic step-by-step or mathematical way to construct SQL query from a given human-readable description?

Yes, there is.

It turns out that natural language expressions and logical expressions and relational algebra expressions and SQL expressions (a hybrid of the last two) correspond in a rather direct way. (What follows is for no duplicate rows & no nulls.)

Each table (base or query result) has an associated predicate--a natural language fill-in-the-(named-)blanks statement template parameterized by column names.

[liker] likes [liked]

A table holds every row that, using the row's column values to fill in the (named) blanks, makes a true statement aka proposition. Here's a table with that predicate & its rows' propositions:

liker  | liked
--------------
Bob    | Dex    /* Bob likes Dex */
Bob    | Alice  /* Bob likes Alice */
Alice  | Carol  /* Alice likes Carol */

Each proposition from filling a predicate with the values from a row in a table is true. And each proposition from filling a predicate with the values from a row not in a table is false. Here's what that table says:

/*
    Alice likes Carol
AND NOT Alice likes Alice
AND NOT Alice likes Bob
AND NOT Alice likes Dex
AND NOT Alice likes Ed
...
AND Bob likes Alice
AND Bob likes Dex
AND NOT Bob likes Bob
AND NOT Bob likes Carol
AND NOT Bob likes Ed
...
AND NOT Carol likes Alice
...
AND NOT Dex likes Alice
...
AND NOT Ed likes Alice
...
*/   

The DBA gives the predicate for each base table. The SQL syntax for a table declaration is a lot like the traditional logic shorthand for the natural language version of a given predicate. Here's a declaration of a base table to hold our value:

/* (person, liked) rows where [liker] likes [liked] */
/* (person, liked) rows where Likes(liker, liked) */
CREATE TABLE Likes (
    liker ...,
    liked ...
);

An SQL query (sub)expression transforms argument table values to a new table value holding the rows that make a true statement from a new predicate. The new table predicate can be expressed in terms of the argument table predicate(s) according to the (sub)expression's relational/table operators. A query is an SQL expression whose predicate is the predicate for the table of rows we want.

When we give a table & (possibly implicit) alias A to be joined, the operator acts on a value & predicate like the table's but with columns renamed from C,... to A.C,.... Then

  • R , S & R CROSS JOIN S are rows where the predicate of R AND the predicate of S

  • R INNER JOIN S ON condition is rows where the predicate of R AND the predicate of S AND condition

  • R LEFT JOIN S ON condition is rows where (for S-only columns S1,...)

        the predicate of R AND the predicate of S AND condition
    OR
            the predicate of R
        AND NOT FOR SOME values for S1,... [the predicate of S AND condition]
        AND S1 IS NULL AND ...
    

  • R WHERE condition is rows where the predicate of R AND condition

  • SELECT DISTINCT A.C AS D,... FROM R (maybe with implicit A. and/or implicit AS D) is rows where

    • FOR SOME values for A.*,... [A.C=D AND ... AND the predicate of R] (This can be less compact but looks more like the SQL.)
    • if there are no dropped columns, the predicate of R with A.C,... replaced by D,...
    • if there are dropped columns, FOR SOME values for the dropped columns [ the predicate of R with A.C,... replaced by D,... ]
  • (X,...) IN (R) means

    • the predicate of R with columns C,... replaced by X,...
    • (X,...) IN R

Example: Natural language for (person, liked) rows where [person] is Bob and Bob likes someone who likes [liked] but who doesn't like Ed:

/* (person, liked) rows where
FOR SOME value for x,
        [person] likes [x]
    and [x] likes [liked]
    and [person] = 'Bob'
    and not [x] likes 'Ed'
*/

Rewrite using shorthand predicates:

/* (person, liked) rows where
FOR SOME value for x,
        Likes(person, x)
    AND Likes(x, liked)
    AND person = 'Bob'
    AND NOT Likes(x, 'Ed')
*/

Rewrite using only shorthand predicates of base & aliased tables:

/* (person, liked) rows where
FOR SOME values for l1.*, l2.*,
        person = l1.liker AND liked = l2.liked
    AND Likes(l1.liker, l1.liked)
    AND Likes(l2.liker, l2.liked)
    AND l1.liked = l2.liker
    AND person = 'Bob'
    AND NOT (l1.liked, 'Ed') IN Likes
*/

Rewrite in SQL:

SELECT DISTINCT l1.liker AS person, l2.liked AS liked
    /* (l1.liker, l1.liked, l2.liker, l2.liked) rows where
        Likes(l1.liker, l1.liked)
    AND Likes(l2.liker, l2.liked)
    AND l1.liked = l2.liker
    AND l1.liker = 'Bob'
    AND NOT (l1.liked, 'Ed') IN Likes
    */
FROM Likes l1
INNER JOIN Likes l2
ON l1.liked = l2.liker
WHERE l1.liker = 'Bob'
AND NOT (l1.liked, 'Ed') IN (SELECT * FROM Likes)

Similarly,

  • R UNION CORRESPONDING S is rows where the predicate of R OR the predicate of R

  • R UNION S is rows where the predicate of R OR the predicate we get by replacing the columns of S by the columns of R in the predicate of R

  • VALUES (X,...), ... with columns C,... is rows where C = X AND ... OR ...

Example:

/* (person) rows where
    FOR SOME value for liked, Likes(person, liked)
OR  person = 'Bob'
*/

    SELECT liker AS person
    FROM Likes
UNION
    VALUES ('Bob')

So if we express our desired rows in terms of given base table natural language statement templates that rows make true or false (to be returned or not) then we can translate to SQL queries that are nestings of logic shorthands & operators and/or table names & operators. And then the DBMS can convert totally to tables to calculate the rows making our predicate true.

See How to get matching data from another SQL table for two different columns: Inner Join and/or Union? re applying this to SQL. (Another self-join.)
See Relational algebra for banking scenario for more on natural language phrasings. (In a relational algebra context.)
See Null in Relational Algebra for another presentation of relational querying.

这篇关于是否有任何经验法则可以根据人类可读的描述构造 SQL 查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆