使用SQL清理数据-取得列差异 [英] Clean Data Using SQL - Take Column Difference
问题描述
我在SQL中的数据如下:
I have data in SQL as follows:
实际表格
+-------------+--------+------+
| Id | Weight | Type |
+-------------+--------+------+
| 00011223344 | 35 | A |
| 00011223344 | 10 | A |
| 12311223344 | 100 | B |
| 00034343434 | 25 | A |
| 00034343434 | 25 | A |
| 99934343434 | 200 | C |
| 88855667788 | 100 | D |
+-------------+--------+------+
列 ID
的长度始终为11,数据类型为varchar.我需要在上表中创建一列 Actual Weight
和 Actual ID
.
Column ID
will always have length of 11 and has data type varchar. I need to create a column Actual Weight
and Actual ID
from the table above.
实际ID
取决于列 ID
.如果ID以 000
开头,则我们需要从 ID
列中找到ID,该ID不是以 000
开头,而是后面的字符(即8个字符)从右)相似.匹配的ID为实际ID
.例如,如果我们查看前3个ID,则前2个以 000
开头,而另一个以 000
开头且右边包含相似的8个字符的ID即 12311223344
,因此在派生列 Actual ID
中,前两行的 Actual Id
为 12311223344
.
Actual Id
is dependent on column ID
. If the ID starts with 000
than we need to find ID from column ID
that does not starts with 000
but characters after that (i.e. 8 characters from right) are similar. Matched ID would be the Actual Id
. For example if we look at first 3 ids first 2 starts with 000
and another ID that does not starts with 000
and contains similar 8 characters from right can be found in 3rd row i.e. 12311223344
therefore in derived column Actual ID
the first 2 rows would have Actual Id
as 12311223344
.
实际权重
取决于2列 ID
和 Weight
中的值.如果任何不是以 000
开头但包含另一个以 000
开头的 Id
,我们需要根据上述标准对列 Id
进行分组与 000
.然后,我们需要通过添加以 000
000 开头的 Id
的 Weight
>并以不以 000
开头的数字为准.
Actual Weight
is dependent on values in 2 columns ID
and Weight
. We need to group column Id
based on the criteria mentioned above if for any Id
that does not starts with 000
but contains another entry that does starts with 000
. Then we need to recalculate Weight
for Id
that does not starts with 000
by adding all Weights of ones starting with 000
and taking difference with one that does not starts with 000
.
例如,如果我们查看前三行,那么在第三行中,我们有以 123
开头的 Id
,并且具有从右到右8位数字的条目,除了它们以 000
而不是 123
(即第1行和第2行)开头.对于以 000
开头的情况,实际重量
与 Weight
相似,但对于以 123
开头的情况实际体重
为 100-(35 + 10)
Example if we look at first 3 rows, in 3rd row we have Id
starting with 123
and having entries that have 8 digits from right similar to this one except they start with 000
instead of 123
(i.e. row 1 and 2). For cases starting with 000
Actual Weight
would be similar to Weight
but for the one starting with 123
Actual Weight
would be 100-(35+10)
我正在寻找一个无需创建其他表/视图即可创建这2个派生列的查询.
I am looking for a query that can create these 2 derived column without need of creating any other table/view.
所需的输出
+-------------+-------------+--------+---------------+------+
| Id | Actual ID | Weight | Actual Weight | Type |
+-------------+-------------+--------+---------------+------+
| 00011223344 | 12311223344 | 35 | 35 | A |
| 00011223344 | 12311223344 | 10 | 10 | A |
| 12311223344 | 12311223344 | 100 | 55 | B |
| 00034343434 | 99934343434 | 25 | 25 | A |
| 00034343434 | 99934343434 | 25 | 25 | A |
| 99934343434 | 99934343434 | 200 | 150 | C |
| 88855667788 | 88855667788 | 100 | 100 | D |
+-------------+-------------+--------+---------------+------+
推荐答案
Hmmmm...如果我关注此事:
Hmmmm . . . If I'm following this:
select t.*,
(case when id like '000%' then weight
else weight - sum(case when id like '000%' then weight else 0 end) over (partition by actual_id)
end) as actual_weight
from (select t.*,
max(id) over (partition by stuff(id, 1, 3, '')) as actual_id
from t
) t;
这里是db小提琴.
这篇关于使用SQL清理数据-取得列差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!