(SQL) 识别字段中字符串格式多次出现的位置 [英] (SQL) Identify positions of multiple occurrences of a string format within a field

查看:55
本文介绍了(SQL) 识别字段中字符串格式多次出现的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将叙述字段(自由文本)拆分为多行.目前的格式是:

I need to split a narrative field (free text) into multiple rows. Format is currently along the lines of:

Case_Reference | Narrative
```````````````|`````````````````````````````````````
XXXX/XX-123456 | [Endless_Text up to ~50k characters]

在作为文本的叙述字段中,单个条目(当各种代理对案例做了一些事情时)以条目日期开始,后跟两个空格(即 'dd/mm/yyyy'),日期值随同一字段中的每个条目而变化.

Within the narrative field as text, individual entries (when various agents have done something to the case) begin with the entry date followed by two spaces (i.e. 'dd/mm/yyyy '), with the values of the dates changing with each entry within that same field.

换句话说,在寻找更好的分隔符之后,我唯一可以使用的就是这种格式的字符串,所以我需要在 Narrative 文本中确定格式(掩码会是更好的词吗?)匹配的多个位置'dd/mm/yyyy'.

In other words, after trawling for a better delimiter, the only one I can use is this format of string, so I need to identify multiple positions within the Narrative text where the format (would mask be a better word?) matches 'dd/mm/yyyy '.

我可以识别多次出现的一致字符串没问题,但它在我主要寻找的地方识别它:

I can identify multiple occurrences of a consistent string no problem, but it's identifying it where I'm essentially looking for:

'%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%'

PATINDEX 当然返回第一个出现/位置,但据我所知,没有办法修改"这个(即创建的函数)以允许拾取其余的出现/位置我们可以使用 CHARINDEX(因为 PATINDEX 没有起始位置参数).

PATINDEX of course returns the first occurrence/position of this, but so far as I'm aware, there's no way to "modify" this (i.e. a created function) to allow for picking up the rest of the occurrences/positions of this they way we can with CHARINDEX (since PATINDEX doesn't have a starting position parameter).

为了清楚起见,我不是在寻找代码来直接对其进行分隔,因为我需要进一步操作每个条目,因此这纯粹是我正在寻找的 Narrative 文本中字符串多次出现的位置.

For clarity, I'm not looking for code to delimit this directly as I need to further manipulate each entry, so it's purely the positions of multiple occurrences of the string within the Narrative text I'm looking for.

非常感谢任何帮助.

为清楚起见,没有选项可以执行此预导入,因此需要在此登陆数据上执行此操作.

For clarity, there's no option to do this pre-import, so it needs to be done on this landed data.

期望的输出是

 Case_Reference1 | 1st_Position_of_Delimiter_String  
 Case_Reference1 | 2nd_Position_of_Delimiter_String  
 Case_Reference2 | 1st_Position_of_Delimiter_String  
 Case_Reference2 | 2nd_Position_of_Delimiter_String  
 Case_Reference2 | 3rd_Position_of_Delimiter_String  

推荐答案

你可以用递归 CTE 来解决这个问题

You might solve this with an recursive CTE

DECLARE @tbl TABLE (Case_Reference NVARCHAR(MAX),Narrative NVARCHAR(MAX));
INSERT INTO @tbl VALUES
 (N'C1',N'01/02/2000  Some text with     blanks 02/03/2000  More text 03/04/2000  An even more')
,(N'C2',N'01/02/2000  Test for C2 02/03/2000  One more for C2 03/04/2000  An even more 04/05/2000  Blah')
,(N'C3',N'01/02/2000  Test for C3 02/03/2000  One more for C3 03/04/2000  An even more')
 ;

WITH recCTE AS
(
    SELECT 1 AS Step,Case_Reference,Narrative,CAST(1 AS BIGINT) AS StartsAt,NewPos.EndsAt+10 AS EndsAt,LEN(Narrative) AS MaxLen
          ,SUBSTRING(Narrative,NewPos.EndsAt+10+1,999999) AS RestString
    FROM @tbl AS tbl
    CROSS APPLY(SELECT PATINDEX('%[0-3][0-9]/[0-1][0-9]/[1-2][0-9][0-9][0-9]  %',SUBSTRING(Narrative,12,9999999))) AS NewPos(EndsAt)

    UNION ALL

    SELECT r.Step+1,r.Case_Reference,r.Narrative,r.EndsAt+1,CASE WHEN NewPos.EndsAt>0 THEN r.EndsAt+NewPos.EndsAt+10 ELSE r.MaxLen END,r.MaxLen
          ,SUBSTRING(r.RestString,NewPos.EndsAt+10+1,999999) 
    FROM recCTE AS r
    CROSS APPLY(SELECT PATINDEX('%[0-3][0-9]/[0-1][0-9]/[1-2][0-9][0-9][0-9]  %',SUBSTRING(r.RestString,12,99999999))) AS NewPos(EndsAt)
    WHERE r.EndsAt<r.MaxLen
)
SELECT Step,Case_Reference,StartsAt,EndsAt
      ,SUBSTRING(Narrative,StartsAt,EndsAt-StartsAt+1) AS OutputString 
FROM recCTE

ORDER BY Case_Reference,Step

结果

+------+----------------+----------+--------+---------------------------------------+
| Step | Case_Reference | StartsAt | EndsAt | OutputString                          |
+------+----------------+----------+--------+---------------------------------------+
| 1    | C1             | 1        | 38     | 01/02/2000  Some text with     blanks |
+------+----------------+----------+--------+---------------------------------------+
| 2    | C1             | 39       | 60     | 02/03/2000  More text                 |
+------+----------------+----------+--------+---------------------------------------+
| 3    | C1             | 61       | 84     | 03/04/2000  An even more              |
+------+----------------+----------+--------+---------------------------------------+
| 1    | C2             | 1        | 24     | 01/02/2000  Test for C2               |
+------+----------------+----------+--------+---------------------------------------+
| 2    | C2             | 25       | 52     | 02/03/2000  One more for C2           |
+------+----------------+----------+--------+---------------------------------------+
| 3    | C2             | 53       | 77     | 03/04/2000  An even more              |
+------+----------------+----------+--------+---------------------------------------+
| 4    | C2             | 78       | 93     | 04/05/2000  Blah                      |
+------+----------------+----------+--------+---------------------------------------+
| 1    | C3             | 1        | 24     | 01/02/2000  Test for C3               |
+------+----------------+----------+--------+---------------------------------------+
| 2    | C3             | 25       | 52     | 02/03/2000  One more for C3           |
+------+----------------+----------+--------+---------------------------------------+
| 3    | C3             | 53       | 76     | 03/04/2000  An even more              |
+------+----------------+----------+--------+---------------------------------------+

这篇关于(SQL) 识别字段中字符串格式多次出现的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆