varchar 字段的分组字符 [英] Group characters of varchar field

查看:24
本文介绍了varchar 字段的分组字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建来自多个供应商的导入数据工具.不幸的是,数据不是由我生成的,所以我必须使用它.我遇到过以下情况.

I am creating an import data tool from several vendors. Unfortunately the data is not generated by me, so i have to work with it. I have come across the following situation.

我有一张如下所示的表格:

I have a table like the following:

ID    |SartDate    |Availability
========================================
H1    |20130728    |YYYYYYNNNNQQQQQ
H2    |20130728    |NNNNYYYYYYY
A3    |20130728    |NNQQQQNNNNNNNNYYYYYY
A2    |20130728    |NNNNNYYYYYYNNNNNN

解释这个数据的含义是:Availability 列中的每个字母都是特定日期的可用性标志,从 StartDate 列中注明的日期开始.

To explain what this data means is: Every letter in the Availability column is the availability flag for a specific date, starting from the date noted in the StartDate column.

  • 是:有空
  • N:不可用
  • 问:根据要求

例如,对于 ID H1,20130728 - 20130802 可用,然后从 20130803 - 20130806 不可用,从 20130807 - 20130811 可应要求提供.

For instance for ID H1 20130728 - 20130802 is available, then from 20130803 - 20130806 is not available and from 20130807 - 20130811 is available on request.

我需要做的是将此表转换为以下设置:

What i need to do is transform this table to the following setup:

ID    |Available   |SartDate    |EndDate     
========================================
H1    |Y           |20130728    |20130802    
H1    |N           |20130803    |20130806    
H1    |Q           |20130806    |20130811    
H2    |N           |20130728    |20130731
H2    |Y           |20130801    |20130807
A3    |N           |20130728    |20130729
A3    |Q           |20130730    |20130802
A3    |N           |20130803    |20130810
A3    |Y           |20130811    |20130816
A2    |Y           |20130728    |20130801
A2    |Y           |20130802    |20130807
A2    |Y           |20130808    |20130813

初始表大约有 40,000 行.可用性列可能有几天(我见过最多 800 天).

The initial table has approximately 40,000 rows. The Availability column may have several days (I've seen up to 800).

我尝试过的是将可用性转换为行,然后将连续天组合在一起,然后获取每个组的最小和最大日期.为此,我使用了三个或四个 CTE

What i have tried is turn the Availability into rows and then group consecutive days together and then get min and max date for each group. For this i have used three or four CTEs

这对一些 ID 来说效果很好,但是当我尝试将它应用到整个表时,它需要很长时间(我在傻瓜时间睡眠后停止了初始测试运行,但它没有完成,是的,我的意思是我是运行时睡觉!!!)

This works fine for a few IDs, but when i try to apply it to the whole table it take ages (I stopped the initial test run after a fool time sleep and it hadn't finish, and yes i mean i was sleeping while it was running!!!!)

我估计,如果我将每个字符放在一行中,那么我最终会得到大约 1450 万行.

I have estimated that if i turn each character in a single row then i end up with something like 14.5 million rows.

所以,我想问,有没有更有效的方法来做到这一点?(我知道有,但我需要你告诉我)

So, i am asking, is there a more efficient way of doing this? (I know there is, but i need you to tell me)

提前致谢.

推荐答案

这可以在 SQL Server 中使用递归 CTE 来完成.下面是一个例子:

This can be done in SQL Server, using recursive CTEs. Here is an example:

with t as (
    select 'H1' as id, cast('20130728' as date) as StartDate,
           'YYYYYYNNNNQQQQQ' as Availability union all
    select 'H2' as id, cast('20130728' as date) as StartDate,
           'NNNNYYYYYYY' as Availability union all
    select 'H3' as id, cast('20130728' as date) as StartDate,
           'NQ' as Availability 
   ),
   cte as (
     select id, left(Availability, 1) as Available,
            StartDate as thedate,
            substring(Availability, 2, 1000) as RestAvailability,
            1 as i,
            1 as periodcnt
     from t
     union all
     select t.id, left(RestAvailability, 1),
            dateadd(dd, 1, thedate),
            substring(RestAvailability, 2, 1000) as RestAvailability,
            1 + cte.i,
            (case when substring(t.Availability, i, 1) = substring(t.Availability, i+1, 1)
                  then periodcnt
                  else periodcnt + 1
             end)
     from t join
          cte
          on t.id = cte.id
     where len(RestAvailability) > 0

   )
select id, min(thedate), max(thedate), Available
from cte
group by id, periodcnt, Available;

它的工作方式是首先展开日期.这将是 CTE 的典型"用法.在此过程中,它还跟踪 Available 是否已从先前的值(在变量 periodcnt 中更改.为此使用字符串操作.

The way this works is that it first spreads out the dates. This would be a "typical" use of CTEs. In the process, it also keeps track of whether Available has changed from the previous value (in the variable periodcnt. It is using string manipulations for this.

有了这些信息,最终结果就是这个 CTE 的简单聚合.

With this information, the final result is simply an aggregation from this CTE.

这篇关于varchar 字段的分组字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆