SQL来查找表中第一次出现的数据集 [英] SQL to find the first occurance of sets of data in a table

查看:125
本文介绍了SQL来查找表中第一次出现的数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说如果我有一个表:

  CREATE TABLE T 

TableDTM TIMESTAMP NOT NULL ,
代码INT NOT NULL
);

我插入一些行:

  INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 10:00:00',5); 
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 10:10:00',5);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 10:20:00',5);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 10:30:00',5);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 10:40:00',0);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 10:50:00',1);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 11:00:00',1);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 11:10:00',1);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 11:20:00',0);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 11:30:00',5);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 11:40:00',5);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 11:50:00',3);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 12:00:00',3);
INSERT INTO T(TableDTM,Code)VALUES('2011-01-13 12:10:00',3);

所以我最终得到了类似的表:

  2011-01-13 10:00:00,5 
2011-01-13 10:10:00,5
2011-01-13 10:20:00,5
2011-01-13 10:30:00,5
2011-01-13 10:40:00,0
2011-01-13 10: 50:00,1
2011-01-13 11:00:00,1
2011-01-13 11:10:00,1
2011-01-13 11:20: 00,0
2011-01-13 11:30:00,5
2011-01-13 11:40:00,5
2011-01-13 11:50:00, 3
2011-01-13 12:00:00,3
2011-01-13 12:10:00,3

如何选择每组相同数字的第一个日期,所以我最终得到:

  2011-01-13 10:00:00,5 
2011-01-13 10:40:00,0
2011-01-13 10:50:00, 1
2011-01-13 11:20:00,0
2011-01-13 11:30:00,5
2011-01-13 11:50:00,3

我在一天的大部分时间都在讨论子查询等问题,似乎不能破解它。我确定在某个地方有一个简单的方法。



我可能想从结果中排除0,但这对现在不重要..



感谢。

解决方案

修订15 Jan 11



我确定在某个地方有一个简单的方法。



但首先是两个问题。


  1. 表不是关系数据库表。它没有唯一的密钥,这是RM和规范化(特别是每行必须有唯一的标识符;不一定是PK)要求。因此,在关系数据库表上操作的标准语言SQL不能对其执行基本操作。




    • 它是一个Heap

    • 使用SQL的任何和所有操作将会非常缓慢,并且不会正确

    • SET ROWCOUNT为1,执行行处理,SQL将在堆上工作很好

    • 最好的办法是使用任何unix utiliy对其进行操作(awk,cut,chop) 。他们疯狂地快。要求回答您的要求的awk脚本需要3分钟写,它将在几秒钟内运行数百万条记录(我上周写了几个)。




    因此,问题是 SQL查找非关系堆中第一次出现的数据集



    现在,如果您的问题是 SQL查找关系表中首次出现的数据集,标识符,这将是(a)容易的SQL和(b)快速的任何风格的SQL ...




    • 这是已知的 严重处理子查询 (特别是Tony Andrews的评论,他是Oracle的知名权威)。在这种情况下,请使用物化视图。



  2. 这个问题非常普遍(没有投诉)。但是这些特定需求中的许多通常在更大的上下文中应用,并且上下文具有这里的说明书中没有的要求。通常需要一个简单的子查询(但在Oracle中使用实例化视图来避免子查询)。而且子查询也取决于外部上下文,外部查询。因此,对小型通用问题的回答将不包含对实际具体需求的答案。







无论如何,我不想回避这个问题。为什么我们不使用真实世界的例子,而不是简单的通用例子;和在关系表中查找另一组数据中的一组数据的第一次或最后一次或最小或最大值?



主查询



让我们使用

报告所有警报自某个日期起,持续时间的峰值不已确认



由于您将对所有时间和历史记录要求使用完全相同的技术(具有不同的表和列名),因此您需要完全理解子查询的基本结构,不同的应用程序。



简介




只有一个纯5NF数据库,有关系标识符(复合键),你有完整的时间能力,并且时间要求被渲染没有打破5NF(没有更新异常),这意味着 ValidToDateTime 导出周期和持续时间,不会在数据中重复。点是,这使事情变得复杂,因此这是不是子查询的教程的最好的例子。





  • 记住SQL引擎是一个集处理器,所以我们用面向集合的心态来处理问题

    • 发动机下行到行处理; 非常缓慢

    • 更重要的是不必要
    • 子查询是正常的SQL。我使用的语法是直接的ISO / IEC / ANSI SQL。

      • 如果您无法在SQL中编写子查询,则非常有限;然后需要引入数据重复或使用大的结果集作为物化视图或临时表或所有方式的附加数据和附加处理,这将是很慢更不用说完全不必要的

      • 如果在真正的关系数据库中(和我的数据模型总是)不能做任何事情,而不切换到行处理,


    • 您需要完全理解第一个子查询更简单),然后试图理解第二个;



    方法



    首先使用最小连接构建Outer查询,基于您需要的结果集的结构,而不是其他。非常重要的是首先解析外部查询的结构;




    • 这恰好需要一个子查询作为查询的外部查询,好。所以现在离开这部分,并选择,以后。现在,外部查询在某个日期之后获取所有(未确认)警报



    ▶SQL代码◀ 必须在第1页(对不起,SO编辑功能太糟糕了,它破坏了格式化,代码已经格式化)。



    子查询(1)派生 Alert.Value



    这是一个简单的派生数据,从 Reading中选择 Value 生成警报。表是相关的,基数是1 :: 1,所以它是一个在PK上的直接连接。




    • 需要的子查询的类型这里是一个相关子查询,我们需要将外部查询中的表与(内部)子查询中的表相关联。

      • 为了做到这一点,我们需要在外部查询中使用表的别名,以将其与子查询中的表相关联。


      • 子查询是在任何引擎中非常快(除了Oracle)

      • SQL是一种繁琐的语言。但这就是我们所有。所以习惯了。



      ▶SQL代码◀ 在第2页。



      给定你在外部查询中的连接混合,通过子查询获取数据,以便你可以学习(你可以通过连接或者获得 Alert.Value ,但是这将是

      )。



      下一个子查询需要派生 Alert.PeakValue 。为此,我们需要确定警报的时间持续时间。我们有警告持续时间的开始;我们需要确定持续时间的结束,即(暂时) Reading.Value ,即在范围内。这需要一个子查询,我们最好先处理。




      • 从内部,向外工作逻辑。

        子查询(2)导出 Alert.EndDtm

    code>



    稍微复杂一点的Suquery选择第一个 Reading.ReadingDtm 或等于 Alert.ReadingDtm ,其具有 Reading.Value ,其小于或等于其 Sensor.UpperLimit



    处理5NF时态数据



    为了处理5NF数据库(其中 EndDateTime 存储,以及重复数据)的时间要求,我们工作仅在 StartDateTime EndDateTime 派生:它是下一页 StartDateTime 。这是持续时间的时间概念。




    • 技术上来说,它是一毫秒使用)少。

    • 但是,为了合理,我们可以将 EndDateTime Next.StartDateTime ,并忽略一毫秒的问题。

    • 代码应始终使用> = This.StartDateTime < Next.StartDateTime

      • 这消除了一系列可避免的错误

      • 请注意,这些比较运算符包含时间持续时间,传统方式,完全独立于与业务逻辑相关的类似比较运算符,例如, Sensor.UpperLimit (即,监视它,因为它们通常位于一个 WHERE 子句中,并且很容易
      • http://www.softwaregems.com.au/Documents/Student%20Resolutions/Mark%20Peak%20Plus.pdfrel =nofollow> ▶SQL代码◀ 必须填写

        子查询(3)导出 Alert.PeakValue



        现在很容易。从 Alert.ReadingDtm 之间的阅读中选择 MAX(Value) >和 Alert.EndDtm 警报的持续时间。



        ▶SQL代码◀



        标量子查询



        除了是相关子查询之外,上述都是标量子查询,因为它们返回单个值;网格中的每个单元格只能填充一个值。 (非标量子查询,返回多个值,是非常合法的,但不是上面的。)



        子查询(4)确认警报



        好的,现在你对上述Correlated Scalar子查询有一个句柄,那些填充集合中的单元格的集合,这是由Outer查询定义的集合,让我们来看看在可以用于约束Outer查询的子查询。我们不想要全部 警报(上面),我们想要未确认警报 Alert 中存在的标识符,不存在于确认中。这不是填充单元格,这正在改变外部集合的内容。当然,这意味着改变 WHERE 子句。




        • 结构,因此 FROM 现有 WHERE 子句。



        只需添加 WHERE 条件即可排除已确认警报。 1 :: 1基数,直接相关联接。



        ▶SQL代码◀ 在第5页。



        差异是,这是一个非标量子查询,生成一组行(一列)。我们有一整套警报(外部集合)与整套致谢匹配。




        • 匹配被处理,因为我们告诉引擎子查询相关,使用别名不需要识别麻烦的连接)

        • 使用 1 ,因为我们正在执行存在检查。将其视为添加到由Outer查询定义的 Alert 集合上的列。

        • 不要使用*,因为我们不需要整个

        • 同样,未使用关联意味着需要一个 WHERE NOT IN() ,但同样,构造定义的列集,然后比较两个集合。更慢。



        子查询(5) 操作警报

        作为对外部查询的替代约束,对于未操作的警报 ,排除 Actioned Alerts 的集合。直相关联接。





        此代码已在Sybase上测试过,请参阅第5页的rel =nofollow> ▶SQL代码◀ ASE 15.0.3使用不同组合的1000 警报和200 致谢以及文档中标识的 Readings 警报。所有执行的零毫秒执行时间(0.003秒决议)。



        如果您需要它,这里是 ▶文本格式的SQL代码◀



        回应评论



        (6) ▶从阅读中注册提醒◀

        此代码循环执行(提供),选择新读数超出范围,并创建警报,除非适用警报已存在。



        (7) ▶从阅读中加载提醒◀

        考虑到您拥有一整套测试数据阅读,此代码使用(6)的修改形式加载适用的警报



        常见问题



        这是简单,当你知道如何。我重复,写SQL没有能力写子查询是非常限制;




        • 开发者实现非规范化数据堆的一半原因(海量数据复制)是因为他们不能写正则化结构

          • 所需的子查询,而不是他们有denormalised for performance。它是他们不能代码归一化。我看过它一百次。

          • 这里的例子:你有一个完全规范化的关系数据库,难度是它的编码,你正在考虑重复表的处理目的。


        • 这并不意味着时态数据库的复杂性增加;

        • Master Suqueries,你将在第98百分位数:规范化,真正的关系数据库;零数据复制;非常高的性能。



        我想你可以找出剩下的查询。



        关系标识符



        请注意,此示例也正好展示了使用关系标识符的强大功能,我们想要的不必加入(是的!真相是关系标识符意味着更少,不是更多,连接,比 Id 键)。




        • 您的时间要求要求包含 DateTime 的键。想象一下,试图用 Id PKs编写上面的代码,将有两个级别的处理:一个用于连接(并且将有更多的连接),另一个用于数据处理。



        标签



        我试图远离口语标签(嵌套,内部等),因为它们不是特定的,并坚持特定的技术术语。为了完整性和理解:在 FROM 子句之后的子查询是




          Materialized View ,在一个查询中导出的结果集,然后输入到另一个查询的 FROM 子句中作为表。


          • Oracle类型称为内联视图。

          • 在大多数情况下,您可以将相关子查询作为物化视图写入,但是这将是更多的I / O和处理(因为Oracle处理子查询是深度的,仅Oracle,物化视图更快)。

            WHERE 子句中的子查询是
          • 谓词子查询,因为它更改了结果集的内容(根据它进行预测)。它可以返回Scalar(一个值)或非Scalar(许多值)。




            • c $ c> WHERE column = 或任何标量运算符


            • [NOT] EXISTS 或 WHERE列[NOT] IN


            li>
          • WHERE 子句中的Suquery不需要相关;下面的工作就好了。标识所有多余的附件:

             SELECT [Never] = FirstName,
            [Acted] = LastName
            FROM User
            WHERE UserId NOT IN(SELECT DISTINCT UserId
            FROM Action



          Say if I have a table:

          CREATE TABLE T
          (
              TableDTM  TIMESTAMP  NOT NULL,
              Code      INT        NOT NULL
          );
          

          And I insert some rows:

          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:00:00', 5);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:10:00', 5);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:20:00', 5);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:30:00', 5);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:40:00', 0);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:50:00', 1);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:00:00', 1);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:10:00', 1);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:20:00', 0);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:30:00', 5);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:40:00', 5);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:50:00', 3);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:00:00', 3);
          INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:10:00', 3);
          

          So I end up with a table similar to:

          2011-01-13 10:00:00, 5
          2011-01-13 10:10:00, 5
          2011-01-13 10:20:00, 5
          2011-01-13 10:30:00, 5
          2011-01-13 10:40:00, 0
          2011-01-13 10:50:00, 1
          2011-01-13 11:00:00, 1
          2011-01-13 11:10:00, 1
          2011-01-13 11:20:00, 0
          2011-01-13 11:30:00, 5
          2011-01-13 11:40:00, 5
          2011-01-13 11:50:00, 3
          2011-01-13 12:00:00, 3
          2011-01-13 12:10:00, 3
          

          How can I select the first date of each set of identical numbers, so I end up with this:

          2011-01-13 10:00:00, 5
          2011-01-13 10:40:00, 0
          2011-01-13 10:50:00, 1
          2011-01-13 11:20:00, 0
          2011-01-13 11:30:00, 5
          2011-01-13 11:50:00, 3
          

          I've been messing about with sub queries and the like for most of the day and for some reason I can't seem to crack it. I'm sure there's a simple way somewhere!

          I would probably want to exclude the 0's from the results, but that's not important for now..

          Thanks.

          解决方案

          Revised 15 Jan 11

          I'm sure there's a simple way somewhere!

          Yes, there is. But first, two Issues.

          1. The table is not a Relational Database table. It does not have an unique key, which is demanded by the RM and Normalisation (specifically that each row must have an unique identifier; not necessarily a PK). Therefore SQL, a standard language, for operating on Relational Database tables, cannot perform basic operations on it.

            • it is a Heap (data structure, inserted and deleted in chronological order), with records not rows.
            • any and all operations using SQL will be horribly slow, and will not be correct
            • SET ROWCOUNT to 1, perform row processing, and SQL will work on the Heap just fine
            • your best bet is use any unix utiliy to operate on it (awk, cut, chop). They are blindingly fast. The awk script required to answer your requirement would take 3 mins to write and it will run in seconds for millions of records (I wrote a few last week).
              .

            So the question really is SQL to find the first occurrence of sets of data in a non-relational Heap.

            Now if your question was SQL to find the first occurrence of sets of data in a Relational table, implying of course some unique row identifier, that would be (a) easy in SQL, and (b) fast in any flavour of SQL ...

            • except Oracle, which is known to handle subqueries badly (specifically Tony Andrews' comments, he is a well-known authority on Oracle). In which case, use Materialised Views.
              .
          2. The question is very generic (no complaint). But many of these specific needs are usually applied within a larger context, and the context has requirements which are absent from the specification here. Generally the need is for a simple Subquery (but in Oracle use a Materialised View to avoid the subquery). And the subquery, too, depends on the outer context, the outer query. Therefore the answer to the small generic question will not contain the answer to the actual specific need.


          Anyway, I do not wish to avoid the question. Why don't we use a real world example, rather than a simple generic one; and find the first or last occurrence, or minimum or maximum value, of a set of data, within another set of data, in a Relational table ?

          Main Query

          Let's use the ▶Data Model◀ from your previous question.

          Report all Alerts since a certain date, with the peak Value for the duration, that are not Acknowledged

          Since you will be using exactly the same technique (with different table and column names) for all your temporal and History requirements, you need to fully understand the basic construct of a Subquery, and its different applications.

          Introduction

          Note that you have, not only a pure 5NF Database, with Relational Identifiers (composite keys), you have full Temporal capability throughout, and the temporal requirement is rendered without breaking 5NF (No Update Anomalies), which means the ValidToDateTime for periods and durations is derived, and not duplicated in data. Point is, that complicates things, hence this is not the best example for a tutorial on Subqueries.

          • Remember the SQL engine is a set-processor, so we approach the problem with a set-oriented mindset
            • do not dumb the engine down to row-processing; that is very slow
            • and more important, unnecessary
          • Subqueries are normal SQL. The syntax I am using is straight ISO/IEC/ANSI SQL.
            • if you cannot code subqueries in SQL, you will be very limited; and then need to introduce data duplication or use large result sets as Materialised Views or temporary tables or all manner of additional data and additional processing, which will be s.l.o.w to v.e.r.y s.l.o.w, not to mention completely unnecessary
            • if there is anything you cannot do in a truly Relational Database (and my Data Models always are) without switching to row-processing or inline views or temp tables, ask for help, which is what you have done here.
          • You need to fully understand the first Subquery (simpler) before attempting to understand the second; etc.

          Method

          First build the Outer query using minimum joins, etc, based on the structure of the result set that you need, and nothing more. It is very important that the structure of the outer query is resolved first; otherwise you will go back and forth trying to make the subquery fit the outer query, and vice versa.

          • That happens to require a Subquery as well. So leave that part out for now, and pick that up later. For now, the Outer query gets all (not un-acknowledged) Alerts after a certain date

          The ▶SQL code◀ required is on page 1 (sorry, the SO edit features are horrible, it destroys the formatting, and the code is already formatted).

          Then build the Subquery to fill each cell.

          Subquery (1) Derive Alert.Value

          That is a simple derived datum, select the Value from the Reading that generated the Alert. The tables are related, the cardinality is 1::1, so it is a straight join on the PK.

          • The type of Subquery required here is a Correlated Subquery, we need to correlate a table in the Outer query to a table in the (inner) Subquery.
            • in order to do that, we need an Alias for the table in the Outer query, to correlate it to a table in the Subquery.
            • to make the distinction, I have used aliases only for such required correlation, and fully qualified names for plain joins
          • Subqueries are very fast in any engine (except Oracle)
          • SQL is a cumbersome language. But that's all we have. So get used to it.

          The ▶SQL code◀ required is on page 2.

          I have purposely given you a mix of joins in the Outer Query vs obtaining data via Subquery, so that you can learn (you could alternately obtain Alert.Value via a join, but that would be even more cumbersome).

          The next Subquery we need derives Alert.PeakValue. For that we need to determine the Temporal Duration of the Alert. We have the beginning of the Alert Duration; we need to determine the end of the Duration, which is the next (temporally) Reading.Value that is within range. That requires a Subquery as well, which we better handle first.

          • Work the logic from the inside, outward. Good old BODMAS.

          Subquery (2) Derive Alert.EndDtm

          A slightly more complex Suquery to select the first Reading.ReadingDtm, that is greater than or equal to the Alert.ReadingDtm, that has a Reading.Value which is less than or equal to its Sensor.UpperLimit.

          Handling 5NF Temporal Data

          For handling temporal requirements in a 5NF Database (in which EndDateTime is not stored, as is duplicate data), we work on a StartDateTime only, and the EndDateTime is derived: it is the next StartDateTime. This is the Temporal notion of Duration.

          • Technically, it is one millisec (whatever the resolution for the Datatype used) less.
          • However, in order to be reasonable, we can speak of, and report, EndDateTime as simply the Next.StartDateTime, and ignore the one millisecond issue.
          • The code should always use >= This.StartDateTime and < Next.StartDateTime.
            • That eliminates a slew of avoidable bugs
            • Note that these comparison operators, which bracket the Temporal Duration, and should be used in a conventional manner throughout as per above, are quite independent of similar comparison operators related to business logic, eg. Sensor.UpperLimit (ie. watch for it, because both are often located in one WHERE clause, and it is easy to mix them up or get confused).

          The ▶SQL code◀ required, along with test data used, is on page 3.

          Subquery (3) Derive Alert.PeakValue

          Now it is easy. Select the MAX(Value) from Readings between Alert.ReadingDtm and Alert.EndDtm, the duration of the Alert.

          The ▶SQL code◀ required is on page 4.

          Scalar Subquery

          In addition to being Correlated Subqueries, the above are all Scalar Subqueries, as they return a single value; each cell in the grid can be filled with only one value. (Non-Scalar Subqueries, that return multiple values, are quite legal, but not for the above.)

          Subquery (4) Acknowledged Alerts

          Ok, now that you have a handle on the above Correlated Scalar Subqueries, those that fill cells in a set, a set that is defined by the Outer query, let's look at a Subquery that can be used to constrain the Outer query. We do not really want all Alerts (above), we want Un-Acknowledged Alerts: the Identifiers that exist in Alert, that do not exist in Acknowledgement. That is not filling cells, that is changing the content of the Outer set. Of course, that means changing the WHERE clause.

          • We are not changing the structure of the Outer set, so there is no change to the FROM and existing WHERE clauses.

          Simply add a WHERE condition to exclude the set of Acknowledged Alerts. 1::1 cardinality, straight Correlated join.

          The ▶SQL code◀ required is on page 5.

          The difference is, this is a non-Scalar Subquery, producing a set of rows (one column). We have an entire set of Alerts (the Outer set) matched against an entire set of Acknowledgements.

          • The matching is processed because we have told the engine that the Subquery is Correlated, by using an alias (no need for cumbersome joins to be identified)
          • Use 1, because we are performing an existence check. Visualise it as a column added onto the Alert set defined by the Outer query.
          • Never use * because we do not need the entire set of columns, and that will be slower
          • Likewise, failing to use a correlation, means a WHERE NOT IN () is required, but again, that constructs the defined column set, then compares the two sets. Much slower.

          Subquery (5) Actioned Alerts

          As an alternative constraint on the Outer query, for un-actioned Alerts, instead of (4), exclude the set of Actioned Alerts. Straight Correlated join.

          The ▶SQL code◀ required is on page 5.

          This code has been tested on Sybase ASE 15.0.3 using 1000 Alerts and 200 Acknowledgements, of different combinations; and the Readings and Alerts identified in the document. Zero milliseconds execution time (0.003 second resolution) for all executions.

          If you need it, here is the ▶SQL Code in Text Format◀.

          Response to Comments

          (6) ▶Register Alert from Reading◀
          This code executes in a loop (provided), selecting new Readings which are out-of-range, and creating Alerts, except where applicable Alerts already exist.

          (7) ▶Load Alert From Reading◀
          Given that you have a full set of test data for Reading, this code uses a modified form of (6) to load the applicable Alerts.

          Common Problem

          It is "simple" when you know how. I repeat, writing SQL without the ability to write Subqueries is very limiting; it is essential for handling Relational Databases, which is what SQL was designed for.

          • Half the reason developers implement unnormalised data heaps (massive data duplication) is because they cannot write the subqueries required for Normalised structures
            • it is not that they have "denormalised for performance"; it is that they cannot code for Normalised. I have seen it a hundred times.
            • Case in point here: you have a fully Normalised Relational Database, and the difficulty is coding for it, and you were contemplating duplicating tables for processing purposes.
          • And that is not counting the added complexity of a temporal database; or a 5NF temporal database.
          • Normalisation means Never Duplicate Anything, more recently known as Don't Repeat Yourself
          • Master Suqueries and you will be in the 98th percentile: Normalised, true Relational Databases; zero data duplication; very high performance.

          I think you can figure out the remaining queries you have.

          Relational Identifier

          Note, this example also happens to demonstrate the power of using Relational Identifiers, in that several tables in-between the ones we want do not have to be joined (yes! the truth is Relational Identifiers means less, not more, joins, than Id keys). Simply follow the solid lines.

          • Your temporal requirement demands keys containing DateTime. Imagine trying to code the above with Id PKs, there would be two levels of processing: one for the joins (and there would be far more of them), and another for the data processing.

          Label

          I try to stay away from colloquial labels ("nested", "inner", etc) because they are not specific, and stick to specific technical terms. For completeness and understanding:

          • a Subquery after the FROM clause, is a Materialised View, a result set derived in one query and then fed into the FROM clause of another query, as a "table".
            • The Oracle types call this Inline View.
            • In most cases, you can write Correlated Subqueries as Materialised Views, but that is massively more I/O and processing (since Oracles handling of subqueries is abyssmal, for Oracle only, Materialised Views are "faster").
              .
          • A Subquery in the WHERE clause is a Predicate Subquery, because it changes the content of the result set (that which it is predicated upon). It can return either a Scalar (one value) or non-Scalar (many values).

            • for Scalars, use WHERE column =, or any scalar operator

            • for non-Scalars, use WHERE [NOT] EXISTS, or WHERE column [NOT] IN

          • A Suquery in the WHERE clause does not need to be Correlated; the following works just fine. Identify all superfluous appendages:

            SELECT  [Never] = FirstName,
                    [Acted] = LastName 
                FROM User 
                WHERE UserId NOT IN ( SELECT DISTINCT UserId
                    FROM Action
                    )

          这篇关于SQL来查找表中第一次出现的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆