提取由“:"分隔的字符串的第 n 个字段;存储在 SQL 列中 [英] Extracting nth field of string delimited by ":" stored in a SQL column

查看:20
本文介绍了提取由“:"分隔的字符串的第 n 个字段;存储在 SQL 列中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两个以下列的 SQL 表:

I have a SQL table with the two following columns:

FORMAT  Sample
GT:AD:DP:GQ:PL  0/0:233,0:233:99:0,120,1800
GT:AD:DP:GQ:PL  0/1:101,61:220:99:835,0,1859
GT:AD:DP:GQ:PL  0/0:172,0:172:99:0,120,1800
GT:AD:DP:GQ:PL  0/0:216,0:216:99:0,120,1800
GT:AD:DP:GQ:PL  0/0:216,0:216:99:0,120,1800
GT:AD:DP:GQ:PGT:PID:PL  0/1:185,232:417:99:0|1:8029494_T_G:8670,0,6429
GT:AD:DP:GQ:PL  0/0:367,0:367:99:0,120,1800
GT:AD:DP:GQ:PGT:PID:PL  0/1:150,198:348:99:0|1:8029494_T_G:7930,0,5677
GT:AD:DP:GQ:PGT:PID:PL  0/1:148,196:344:99:0|1:8029494_T_G:7876,0,5652
GT:AD:DP:GQ:PGT:PID:PL  0/0:148,0:344:99:0|1:8029494_T_G:7876,8334,14591
GT:AD:DP:GQ:PGT:PID:PL  0/0:148,0:344:99:0|1:8029494_T_G:7876,8334,14591

FORMAT 列指定在以下列中以:"分隔的字段的 ID.

The FORMAT column specifies the IDs for the fields that are given in the following column splitted by ":".

我想根据 FORMAT 列中的 ID/位置从第二列中提取特定字段,即 AD(第 2 位)、DP(第 3 位)或 GQ(第 4 位).

I would like to extract specific fields from the second column based on the ID/position from the FORMAT column, i.e. AD (2nd), DP (3rd) or GQ (4th).

我能够使用以下代码提取 AD 字段:

I was able to extract the AD field with the following code:

SELECT SUBSTRING(Sample, CHARINDEX(':',Sample)+1, CHARINDEX(':',Sample,5)-5) FROM Table 1;

问题是我无法提取字段 DP 或 GQ,因为不同字段的长度并不总是相同的,我无法指定哪个应该是搜索以下:"的起始位置位置.

The problem is that I am not able to extract the fields DP or GQ, since the length of the different fields is not always the same one and I cannot specify which should be the starting position to search for the following ":" location.

我也尝试使用本网站的拆分功能:

I also tried to use the Split function from this website:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648

问题是我不知道如何将列声明为变量,以便我可以为表格的每一行提取所需的字段.

The problem is that I do not know how to declare a column as a variable so that I can extract the required field for every single row of the table.

[Sample] 列的所需输出应如下所示:

The desired output for the [Sample] column should look like this:

GT  AD  DP  GQ
0/0 233,0   233 99
0/1 101,61  220 99
0/0 172,0   172 99
0/0 216,0   216 99
0/0 216,0   216 99
0/1 185,232 417 99
0/0 367,0   367 99
0/1 150,198 348 99
0/1 148,196 344 99
0/0 148,0   344 99
0/0 148,0   344 99

任何帮助将不胜感激,

谢谢,

推荐答案

也许有点 XML 作为解析器

Perhaps a little XML as the parser

示例

Select A.Format
      ,B.*
 From  YourTable A
 Cross Apply (
                Select Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
                      ,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
                      ,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
                From  (Select Cast('<x>' + replace((Select replace(A.Format,':','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A 
             ) B

退货

Format                  Pos2    Pos3    Pos4
GT:AD:DP:GQ:PL          AD      DP      GQ
GT:AD:DP:GQ:PL          AD      DP      GQ
GT:AD:DP:GQ:PL          AD      DP      GQ
GT:AD:DP:GQ:PL          AD      DP      GQ
GT:AD:DP:GQ:PL          AD      DP      GQ
GT:AD:DP:GQ:PGT:PID:PL  AD      DP      GQ
GT:AD:DP:GQ:PL          AD      DP      GQ
GT:AD:DP:GQ:PGT:PID:PL  AD      DP      GQ
GT:AD:DP:GQ:PGT:PID:PL  AD      DP      GQ
GT:AD:DP:GQ:PGT:PID:PL  AD      DP      GQ
GT:AD:DP:GQ:PGT:PID:PL  AD      DP      GQ

或简单版本

Select A.Format
      ,Pos2 = Cast('<x>' + replace(Format,':','</x><x>')+'</x>' as xml).value('/x[2]','varchar(max)')
      ,Pos3 = Cast('<x>' + replace(Format,':','</x><x>')+'</x>' as xml).value('/x[3]','varchar(max)')
      ,Pos4 = Cast('<x>' + replace(Format,':','</x><x>')+'</x>' as xml).value('/x[4]','varchar(max)')
 From  YourTable A

或者如果对 UDF 开放

看一看 TSQL/SQL Server - 将分隔字符串解析/拆分为多个/单独列的表函数

编辑 - 示例更新

Select A.Format
      ,GT = Cast('<x>' + replace(Sample,':','</x><x>')+'</x>' as xml).value('/x[1]','varchar(max)')
      ,AD = Cast('<x>' + replace(Sample,':','</x><x>')+'</x>' as xml).value('/x[2]','varchar(max)')
      ,DP = Cast('<x>' + replace(Sample,':','</x><x>')+'</x>' as xml).value('/x[3]','varchar(max)')
      ,GQ = Cast('<x>' + replace(Sample,':','</x><x>')+'</x>' as xml).value('/x[4]','varchar(max)')
 From  YourTable A

这篇关于提取由“:"分隔的字符串的第 n 个字段;存储在 SQL 列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆