使用SQL清理数据-取得列差异 [英] Clean Data Using SQL - Take Column Difference

查看:46
本文介绍了使用SQL清理数据-取得列差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SQL中的数据如下:

I have data in SQL as follows:

实际表格

+-------------+--------+------+
|     Id      | Weight | Type |
+-------------+--------+------+
| 00011223344 |     35 | A    |
| 00011223344 |     10 | A    |
| 12311223344 |    100 | B    |
| 00034343434 |     25 | A    |
| 00034343434 |     25 | A    |
| 99934343434 |    200 | C    |
| 88855667788 |    100 | D    |
+-------------+--------+------+

ID 的长度始终为11,数据类型为varchar.我需要在上表中创建一列 Actual Weight Actual ID .

Column ID will always have length of 11 and has data type varchar. I need to create a column Actual Weight and Actual ID from the table above.

实际ID 取决于列 ID .如果ID以 000 开头,则我们需要从 ID 列中找到ID,该ID不是以 000 开头,而是后面的字符(即8个字符)从右)相似.匹配的ID为实际ID .例如,如果我们查看前3个ID,则前2个以 000 开头,而另一个以 000 开头且右边包含相似的8个字符的ID即 12311223344 ,因此在派生列 Actual ID 中,前两行的 Actual Id 12311223344 .

Actual Id is dependent on column ID. If the ID starts with 000 than we need to find ID from column ID that does not starts with 000 but characters after that (i.e. 8 characters from right) are similar. Matched ID would be the Actual Id. For example if we look at first 3 ids first 2 starts with 000 and another ID that does not starts with 000 and contains similar 8 characters from right can be found in 3rd row i.e. 12311223344 therefore in derived column Actual ID the first 2 rows would have Actual Id as 12311223344.

实际权重取决于2列 ID Weight 中的值.如果任何不是以 000 开头但包含另一个以 000 开头的 Id ,我们需要根据上述标准对列 Id 进行分组与 000 .然后,我们需要通过添加以 000 000 开头的 Id Weight >并以不以 000 开头的数字为准.

Actual Weight is dependent on values in 2 columns ID and Weight. We need to group column Id based on the criteria mentioned above if for any Id that does not starts with 000 but contains another entry that does starts with 000. Then we need to recalculate Weight for Id that does not starts with 000 by adding all Weights of ones starting with 000 and taking difference with one that does not starts with 000.

例如,如果我们查看前三行,那么在第三行中,我们有以 123 开头的 Id ,并且具有从右到右8位数字的条目,除了它们以 000 而不是 123 (即第1行和第2行)开头.对于以 000 开头的情况,实际重量 Weight 相似,但对于以 123 开头的情况实际体重 100-(35 + 10)

Example if we look at first 3 rows, in 3rd row we have Id starting with 123 and having entries that have 8 digits from right similar to this one except they start with 000 instead of 123 (i.e. row 1 and 2). For cases starting with 000 Actual Weight would be similar to Weight but for the one starting with 123 Actual Weight would be 100-(35+10)

我正在寻找一个无需创建其他表/视图即可创建这2个派生列的查询.

I am looking for a query that can create these 2 derived column without need of creating any other table/view.

所需的输出

+-------------+-------------+--------+---------------+------+
|     Id      |  Actual ID  | Weight | Actual Weight | Type |
+-------------+-------------+--------+---------------+------+
| 00011223344 | 12311223344 |     35 |            35 | A    |
| 00011223344 | 12311223344 |     10 |            10 | A    |
| 12311223344 | 12311223344 |    100 |            55 | B    |
| 00034343434 | 99934343434 |     25 |            25 | A    |
| 00034343434 | 99934343434 |     25 |            25 | A    |
| 99934343434 | 99934343434 |    200 |           150 | C    |
| 88855667788 | 88855667788 |    100 |           100 | D    |
+-------------+-------------+--------+---------------+------+

推荐答案

Hmmmm...如果我关注此事:

Hmmmm . . . If I'm following this:

select t.*,
       (case when id like '000%' then weight
             else weight - sum(case when id like '000%' then weight else 0 end) over (partition by actual_id)
        end) as actual_weight
from (select t.*,
             max(id) over (partition by stuff(id, 1, 3, '')) as actual_id
      from t
     ) t;

这里是db小提琴.

这篇关于使用SQL清理数据-取得列差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆