测试 SQL 查询的最佳方法 [英] Best way to test SQL queries

查看:34
本文介绍了测试 SQL 查询的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个问题,其中我们不断有复杂的 SQL 查询出错.本质上,这会导致向错误的客户发送邮件以及其他类似的问题".

I have run into a problem wherein we keep having complex SQL queries go out with errors. Essentially this results in sending mail to the incorrect customers and other 'problems' like that.

每个人创建这样的 SQL 查询的经验如何?我们每隔一周都会创建新的数据群组.

What is everyone's experience with creating SQL queries like that? We are creating new cohorts of data every other week.

以下是我的一些想法及其局限性:

So here are some of my thoughts and the limitations to them:

  • 创建测试数据 虽然这将证明我们拥有所有正确的数据,但并不能强制排除生产中的异常.这些数据在今天被认为是错误的,但在 10 年前可能是正确的;它没有记录在案,因此我们只有在提取数据后才知道它.

  • Creating test data Whilst this would prove that we have all the correct data it does not enforce the exclusion of anomalies in production. That is data that would be considered wrong today but may have been correct 10 years ago; it wasn't documented and therefore we only know about it after the data is extracted.

创建维恩图和数据映射 这似乎是测试查询设计的可靠方法,但它并不能保证实现是正确的.它让开发人员提前计划并在他们编写时思考正在发生的事情.

Create Venn diagrams and data maps This seems to be a solid way to test the design of a query, however it doesn't guarantee that the implementation is correct. It gets the developers planning ahead and thinking of what is happening as they write.

感谢您对我的问题提供的任何意见.

Thanks for any input you can give to my problem.

推荐答案

您不会编写具有 200 行长的函数的应用程序.您可以将这些长功能分解为更小的功能,每个功能都有一个明确定义的职责.

You wouldn't write an application with functions 200 lines long. You'd decompose those long functions into smaller functions, each with a single clearly defined responsibility.

为什么要这样写 SQL?

Why write your SQL like that?

分解查询,就像分解函数一样.这使得它们更短、更简单、更容易理解、更容易测试、更容易重构.它允许您在它们之间添加垫片",并在它们周围添加包装器",就像您在程序代码中所做的一样.

Decompose your queries, just like you decompose your functions. This makes them shorter, simpler, easier to comprehend, easier to test, easier to refactor. And it allows you to add "shims" between them, and "wrappers" around them, just as you do in procedural code.

你是怎么做到的?通过将查询所做的每件重要事情都放入视图中.然后你从这些简单的视图中组合更复杂的查询,就像你用更原始的函数组合更复杂的函数一样.

How do you do this? By making each significant thing a query does into a view. Then you compose more complex queries out of these simpler views, just as you compose more complex functions out of more primitive functions.

最重要的是,对于大多数视图组合,您将从 RDBMS 中获得完全相同的性能.(有些人不会;那又怎样?过早的优化是万恶之源.首先正确编码,然后根据需要进行优化.)

And the great thing is, for most compositions of views, you'll get exactly the same performance out of your RDBMS. (For some you won't; so what? Premature optimization is the root of all evil. Code correctly first, then optimize if you need to.)

以下是使用多个视图的示例分解复杂的查询.

在示例中,由于每个视图只添加了一个变换,每个视图都可以独立测试以发现错误,测试简单.

In the example, because each view adds only one transformation, each can be independently tested to find errors, and the tests are simple.

这是示例中的基表:

create table month_value( 
    eid int not null, month int, year int,  value int );

这个表有缺陷,因为它使用两列,月份和年份,来表示一个数据,一个绝对月份.这是我们对新的计算列的规范:

This table is flawed, because it uses two columns, month and year, to represent one datum, an absolute month. Here's our specification for the new, calculated column:

我们将其作为一个线性变换来进行,这样它的排序与 (year, month) 相同,并且对于任何 (year, month) 元组都有一个且唯一的值,并且所有值都是连续的:

We'll do that as a linear transform, such that it sorts the same as (year, month), and such that for any (year, month) tuple there is one and only value, and all values are consecutive:

create view cm_absolute_month as 
select *, year * 12 + month as absolute_month from month_value;

现在我们要测试的是我们的规范中固有的,即对于任何元组(年,月),只有一个(absolute_month),并且(absolute_month)是连续的.让我们编写一些测试.

Now what we have to test is inherent in our spec, namely that for any tuple (year, month), there is one and only one (absolute_month), and that (absolute_month)s are consecutive. Let's write some tests.

我们的测试将是一个 SQL select 查询,具有以下结构:连接在一起的测试名称和 case 语句.测试名称只是一个任意字符串.case语句就是case when测试语句then 'passed' else 'failed' end.

Our test will be a SQL select query, with the following structure: a test name and a case statement catenated together. The test name is just an arbitrary string. The case statement is just case when test statements then 'passed' else 'failed' end.

测试语句将只是 SQL 选择(子查询),必须为真才能通过测试.

The test statements will just be SQL selects (subqueries) that must be true for the test to pass.

这是我们的第一个测试:

Here's our first test:

--a select statement that catenates the test name and the case statement
select concat( 
-- the test name
'For every (year, month) there is one and only one (absolute_month): ', 
-- the case statement
   case when 
-- one or more subqueries
-- in this case, an expected value and an actual value 
-- that must be equal for the test to pass
  ( select count(distinct year, month) from month_value) 
  --expected value,
  = ( select count(distinct absolute_month) from cm_absolute_month)  
  -- actual value
  -- the then and else branches of the case statement
  then 'passed' else 'failed' end
  -- close the concat function and terminate the query 
  ); 
  -- test result.

运行该查询会产生以下结果:对于每个(年,月),只有一个(absolute_month):已通过

Running that query produces this result: For every (year, month) there is one and only one (absolute_month): passed

只要month_value中有足够的测试数据,这个测试就有效.

As long as there is sufficient test data in month_value, this test works.

我们也可以为足够的测试数据添加一个测试:

We can add a test for sufficient test data, too:

select concat( 'Sufficient and sufficiently varied month_value test data: ',
   case when 
      ( select count(distinct year, month) from month_value) > 10
  and ( select count(distinct year) from month_value) > 3
  and ... more tests 
  then 'passed' else 'failed' end );

现在让我们测试它是连续的:

Now let's test it's consecutive:

select concat( '(absolute_month)s are consecutive: ',
case when ( select count(*) from cm_absolute_month a join cm_absolute_month b 
on (     (a.month + 1 = b.month and a.year = b.year) 
      or (a.month = 12 and b.month = 1 and a.year + 1 = b.year) )  
where a.absolute_month + 1 <> b.absolute_month ) = 0 
then 'passed' else 'failed' end );

现在让我们将我们的测试(只是查询)放入一个文件中,然后针对数据库运行该脚本.确实,如果我们将视图定义存储在针对数据库运行的脚本(或脚本,我建议每个相关视图一个文件)中,我们可以将每个视图的测试添加到相同脚本中,这样(重新)创建我们的视图的行为也会运行视图的测试.这样,当我们重新创建视图时,我们都会得到回归测试,并且当视图创建针对生产运行时,视图也将在生产中进行测试.

Now let's put our tests, which are just queries, into a file, and run the that script against the database. Indeed, if we store our view definitions in a script (or scripts, I recommend one file per related views) to be run against the database, we can add our tests for each view to the same script, so that the act of (re-) creating our view also runs the view's tests. That way, we both get regression tests when we re-create views, and, when the view creation runs against production, the view will will also be tested in production.

这篇关于测试 SQL 查询的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆