SQL:使用分组选择最近的顺序不同的值 [英] SQL: Select Most Recent Sequentially Distinct Value w/ Grouping

查看:19
本文介绍了SQL:使用分组选择最近的顺序不同的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在编写查询时遇到问题,该查询将选择基于另一列 (Col B) 分组的最后一个新"顺序不同值(我们将此列称为 Col A).由于这有点模棱两可/令人困惑,因此这里有一个示例来解释(假设行号表示 inside 组的序列;在我的问题中,行按日期排序):

I am having trouble writing a query that would select the last "new" sequentially distinct value (let's call this column Col A) grouped based on another column (Col B). Since this is a bit ambiguous/confusing, here is an example to explain (assume row number is indicative of sequence inside groups; in my issue the rows are ordered by date):

|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1      | A     | A     |
| 2      | B     | A     |
| 3      | C     | A     |
| 4      | B     | B     |
| 5      | A     | B     |
| 6      | B     | B     |

会选择:

| 3      | C     | A     |
| 6      | B     | B     |

请注意,虽然 B 也出现在第 4 行,但第 5 行包含 A 的事实意味着第 6 行中的 B 是顺序不同的.但是如果表格看起来像这样:

Note that although B also appears in row 4, the fact that row 5 contains A means that the B in row 6 is sequentially distinct. But if table looked like this:

|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1      | A     | A     |
| 2      | B     | A     |
| 3      | C     | A     |
| 4      | B     | B     |
| 5      | A     | B     |
| 6      | A     | B     | <--

然后我们要选择:

| 3      | C     | A     |
| 5      | A     | B     |

我认为如果我不关心值是不同的但不是连续的,这将是一个更容易的问题.我不确定在进行查询时如何考虑序列.

I think that this would be an easier problem if I wasn't concerned with values being distinct but not sequential. I'm not really sure how to even consider sequence when making a query.

我试图通过计算出现 Col A 的每个值的最小/最大行数来解决这个问题.该计算(使用第二个示例表)将产生如下结果:

I have attempted to solve this by calculating the min/max row numbers where each value of Col A appears. That calculation (using the second sample table) would produce a result like this:

|--------|--------|--------|--------|
| ColA   | ColB   | MinRow | MaxRow |
|--------|--------|--------|--------|
| A      | A      | 1      | 1      |
| B      | A      | 2      | 2      |
| C      | A      | 3      | 3      | 
| A      | B      | 5      | 6      |
| B      | B      | 4      | 4      | 

在相关帖子中提出的解决方案(SQL:选择具有最后一个新的连续不同值的行)走上了类似的道路,本质上是采用与最后一个 ColA 不同的最新 RowNum,然后选择下一行.但是,在那个问题中,我未能解决查询适用于多个组的需求,因此发布了新帖子.

A solution raised in a related post (SQL: Select Row with Last New Sequentially Distinct Value) went on a similar path, essentially taking the most recent RowNum which differs from the last ColA and then picks the next row. However, in that question I failed to address the need for the query to work for multiple groups, hence the new post.

如果可以在 SQL 中解决此问题的任何帮助,我们将不胜感激.我正在运行 SQL 2008 SP4.

Any help with this problem, if it is at all possible to do in SQL, would be greatly appreciated. I am running SQL 2008 SP4.

推荐答案

嗯...一种方法是获取最后一个值.然后选择具有该值的所有最后一行并聚合:

Hmmm . . . One method is to get the last value. Then choose all the last rows with that value and aggregate:

select min(rownum), colA, colB
from (select t.*,
             first_value(colA) over (partition by colB order by rownum desc) as last_colA
      from t
     ) t
where rownum > all (select t2.rownum
                    from t t2
                    where t2.colB = t.colB and t2.colA <> t.last_colA
                   )
group by colA, colB;

或者,没有聚合:

select t.*
from (select t.*,
             first_value(colA) over (partition by colB order by rownum desc) as last_colA,
             lag(colA) over (partition by colB order by rownum) as prev_clA
      from t
     ) t
where rownum > all (select t2.rownum
                    from t t2
                    where t2.colB = t.colB and t2.colA <> t.last_colA
                   ) and
      (prev_colA is null or prev_colA <> colA);

但在 SQL Server 2008 中,让我们将此视为间隙和孤岛问题:

But in SQL Server 2008, let's treat this as a gaps-and-islands problem:

select t.*
from (select t.*,
             min(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as min_rownum_group,
             max(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as max_rownum_group
      from (select t.*,
                   row_number() over (partition by colB order by rownum) as seqnum_b,
                   row_number() over (partition by colB, colA order by rownum) as seqnum_ab,
                   max(rownum) over (partition by colB order by rownum) as max_rownum
            from t
           ) t
     ) t
where rownum = min_rownum_group and  -- first row in the group defined by adjacent colA, colB
      max_rownum_group = max_rownum  -- last group for each colB;

这使用行号的差异来标识每个组.它计算数据中组和整体的最大行数.这些对于最后一组是相同的.

This identifies each of the groups using a difference of row numbers. It calculates the maximum rownum for the group and overall in the data. These are the same for the last group.

这篇关于SQL:使用分组选择最近的顺序不同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆