用给定的数据集使用组建立结果的并发症 [英] Complications building result using group by given data set

查看:69
本文介绍了用给定的数据集使用组建立结果的并发症的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参阅SQL小提琴以获取可用示例: http://sqlfiddle.com/#!3 / 6d04f / 1



我有4个表格,TableA,TableB,TableC和TableD。

TableA数据和结构:

  + ----- --------------- + 
| ID |名称|
+ -------------------- +
| 987 |密西沙加|
| 454 |霍博肯|
| 343 |伯克利|
+ -------------------- +

TableB数据和结构:

  + --------------- --------------------------------------------- + 
| City1ID | City1Name | City2ID | City2Name |因子|
+ --------------------------------------------- --------------- +
| 343 |伯克利| 19 |奥克兰| 0.5 |
| 987 |密西沙加| 23 |多伦多| 1.0 |
| 66 |雷德蒙德| 13 |西雅图| 1.0 |
| 343 |伯克利| 14 |圣何塞| 0.5 |
| 454 |霍博肯| 55 |纽约市| 0.9 |
| 454 |霍博肯| 44 |费城| 0.1
| 19 |奥克兰| 9 |旧金山| 1.0 |
...
+ -------------------------------------- ---------------------- +



< TableC:是每年每天都有小时数据的表格。并非所有的城市都在TableC中。表A代表TableB中存在映射的城市的一个小子集。在我的情况下,TableA有439个城市(行),TableB有7000行映射。



TableC的结构如下:

  + -------------------------------- ---------------------------------------- + 
| Date | CID |名称| Blah1 | Blah2 |小时1 |小时2 |小时3 | ... | Hour24 |
+ --------------------------------------------- --------------------------- +

以下是TableC一天数据量的示例:

  2006-08-01 00 :00:00 9旧金山Blah1 Blah2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
2006-08-01 00:00:00 23多伦多Blah1 Blah2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2006-08-01 00:00:00 13西雅图Blah1 Blah2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
2006-08-01 00:00:00 14圣何塞布拉1布拉2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
2006-08-01 00:00:00 55纽约市Blah1 Blah2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
2006-08-01 00:00:00 44费城Blah1 Blah2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

例如,奥克兰市不在TableC中。然而,伯克利的小时观测数据是通过求和得到的(0.5 x的圣何塞数据和0.5 x的奥克兰数据)。

TableD结构:

  + ------- --------------------------- + 
|日期| ID |名称|小时|值|
+ ---------------------------------- +

TableD是最终结果,应该使用TableA中每个城市的每天每小时观察数据填充TableD。

在我的示例中,HOUR 1应该如下所示,其中Mississauga = 2.0(1 x Toronto = 1 x 2 = 2),Hoboken = 5.10(0.9 x纽约城市+ 0.1 x费城= 0.9 x 5 + 0.1 x 6 = 5.10)和伯克利= 2.50(0.5 x圣何塞+ 0.5 x奥克兰= 0.5 x 4 + 0.5 x旧金山= 0.5 x 4 + 0.5 x 1 = 2.50) :

  + ------------------------ -------------------------------- + 
|日期| ID |名称|小时|值|
+ --------------------------------------------- ----------- +
| 2006-08-01 00:00:00 | 987 |密西沙加| 1 | 2.00 |
| 2006-08-01 00:00:00 | 454 |霍博肯| 1 | 5.10 |
| 2006-08-01 00:00:00 | 343 |伯克利| 1 | 2.50 |
+ --------------------------------------------- ----------- +

现在,棘手的部分是奥克兰不是存在于TableC中,而是映射到旧金山,因此任何映射到奥克兰的城市都需要引用它映射到的城市。在我的情况下,有很多情况发生这种情况,所以我想确保我能正确处理这个问题。



我对TableC进行解包/规格化并将其与表B但不知道如何继续。我知道我需要使用Group By,但不知道如何处理,以及我所描述的警告。

  select * from 

选择日期,CID,名称,替换(细节,'小时','')为小时,ObservationValue
从TableC
作为结果
unpivot $ b ([Hour1],[Hour2],[Hour3],[Hour4],[Hour5],[Hour6],
[Hour7],[Hour8],[Hour9]中的详细信息的ObservationValue ],[小时10],[小时11],[小时12],[小时13],[小时14],
[小时15],[小时16],[小时17],[小时18],[小时19],[小时20], [小时21],[小时22],
[小时23],[小时24])

作为UnPvt
)作为结果X
连接TableB作为b on(b。 City2ID = resultX.CID)

如果您无法访问下面的SQL Fiddle链接,创建Schema:

pre code $ CREATE TABLE TableA
([Id] int,[name] varchar(20));

插入表格
([Id],[name])

(987,'Mississauga'),
(454,'Hoboken '),
(343,'伯克利');

CREATE TABLE TableB
([City1ID] int,[City1Name] varchar(20),
[City2ID] int,[City2Name] varchar(20),[Factor] varchar (20))

插入表B
([City1ID],[City1Name],[City2ID],[City2Name],[Factor])
VALUES
343,'伯克利',19,'奥克兰',0.5),
(987,'Mississauga',23,'Toronto',1.0),
(66,'Redmond',13,'Seattle ',1.0),
(343,'伯克利',14,'圣何塞',0.5),
(454,'Hoboken',55,'纽约市',0.9),
(454,'Hoboken',44,'Philadephia',0.1),
(19,'奥克兰',9,'旧金山',1.0);

CREATE TABLE TableC
([date] datetime,[CId] int,[name] varchar(50),[blah1] varchar(10),
[blah2] varchar (10),[hour1] int,[hour2] int,[hour3] int,[hour4] int,
[hour5] int,[hour6] int,[hour7] int,[hour8] int,[hour9 ] int,
[hour10] int,[hour11] int,[hour12] int,[hour13] int,[hour14] int,
[hour15] int,[hour16] int,[hour17] int ,[hour18] int,[hour19] int,
[hour20] int,[hour21] int,[hour22] int,[hour23] int,[hour24] int);

INSERT INTO TableC
([date],[CId],[name],[blah1],
[blah2],[hour1],[hour2],[hour3 ],[小时4],
[小时5],[小时6],[小时7],[小时8],[小时9],
[小时10],[小时11],[小时12],[小时13], [hour14],
[hour15],[hour16],[hour17],[hour18],[hour19],
[hour20],[hour21],[hour22],[hour23],[hour24 ])

VALUES
('2006-08-01 00:00:00',9,'旧金山','Blah1','Blah2',1,1,1, 1,1',1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
('2006 '08'00:00:00'','23','多伦多','Blah1','Blah2',2,2,2,2,2,2,2,2,2,2,2,2,2 ,2,2,2,2,2,2,2,2,2,2,2),
('2006-08-01 00:00:00',13,'Seattle','Blah1 ,3',3',3',3',3',3',3',3',3',3',3',3',3' 3,3),
('2006-08-01 00:00:00',14,'San Jose','Blah1','Blah2',4,4,4,4,4,4, 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4, 01 00:00:00',55'纽约市','Blah1','Blah2' 5,5,5,5,5,5,5,5,5,5),
('2006-08-01 00:00:00',44,'费城','Blah1' ,'Blah2',6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6, ,6);


解决方案

这会做与您的数据和结构

 选择日期,cityid,cityname,hr,sum(hour1)作为总


选择c.Date作为日期,
isnull(a.id,parent。 city1id)作为cityid,
isnull(a.name,parent.city1name)作为cityname,
isnull(parent.factor,1)* ---父因子
b.factor * - - 实际系数
isnull(c.hour1,0)为小时1,
1为小时

从表a a
右外部连接表b b上b.city1id = a。[id]
left parent.city2id = b.city1id
左外连接tablec父c.cid = isnull(b.city2id,b.city1id)
)final
其中date不为空且cityid不为空
group by cityid,cityname,date,hr


Please see SQL Fiddle here for usable example: http://sqlfiddle.com/#!3/6d04f/1

I have 4 tables, TableA, TableB, TableC and TableD.

TableA data and structure:

+--------------------+
| ID | Name          |
+--------------------+
| 987 | Mississauga  |
| 454 | Hoboken      |
| 343 | Berkeley     |
+--------------------+

TableB data and structure:

+------------------------------------------------------------+
| City1ID | City1Name     | City2ID | City2Name     | Factor |
+------------------------------------------------------------+
| 343     | Berkeley      | 19      | Oakland       | 0.5    |
| 987     | Mississauga   | 23      | Toronto       | 1.0    |
| 66      | Redmond       | 13      | Seattle       | 1.0    |
| 343     | Berkeley      | 14      | San Jose      | 0.5    |
| 454     | Hoboken       | 55      | New York City | 0.9    |
| 454     | Hoboken       | 44      | Philadelphia  | 0.1    
| 19      | Oakland       | 9       | San Francisco | 1.0    |
...
+------------------------------------------------------------+

TableC: is a table that has hourly data for cities for each day of the year. Not all cities are in TableC. TableA represents a small subset of the cities for which a mapping exists in TableB. In my case, TableA has 439 cities (rows), and TableB has 7000 rows of mappings.

The structure of TableC is as follows:

+------------------------------------------------------------------------+
|Date | CID | Name | Blah1 | Blah2 | Hour1 | Hour2 | Hour3 | ... | Hour24 |
+------------------------------------------------------------------------+

Here is an example of one day's worth of data for TableC:

2006-08-01 00:00:00 9 San Francisco Blah1 Blah2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2006-08-01 00:00:00 23 Toronto Blah1 Blah2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2006-08-01 00:00:00 13 Seattle Blah1 Blah2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
2006-08-01 00:00:00 14 San Jose Blah1 Blah2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
2006-08-01 00:00:00 55 New York City Blah1 Blah2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
2006-08-01 00:00:00 44 Philadelphia Blah1 Blah2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

For illustration, the city of Oakland is NOT present in TableC. However, hourly observation data for Berkeley is derived by summing ( 0.5 x data for San Jose AND 0.5 x data for Oakland).

TableD structure:

+----------------------------------+
| Date | ID | Name |  Hour | Value |
+----------------------------------+

TableD is the end result, which should be populated with hourly observation data for each day for each of the cities in TableA.

In my example HOUR 1 should look like the following, with Mississauga = 2.0 (1 x Toronto = 1 x 2 = 2), Hoboken = 5.10 (0.9 x New York City + 0.1 x Philadelphia = 0.9 x 5 + 0.1 x 6 = 5.10), and Berkeley = 2.50 (0.5 x San Jose + 0.5 x Oakland = 0.5 x 4 + 0.5 x San Francisco = 0.5 x 4 + 0.5 x 1 = 2.50):

+--------------------------------------------------------+
| Date                | ID  |     Name    | Hour | Value |
+--------------------------------------------------------+
| 2006-08-01 00:00:00 | 987 | Mississauga |  1   | 2.00  | 
| 2006-08-01 00:00:00 | 454 | Hoboken     |  1   | 5.10  |
| 2006-08-01 00:00:00 | 343 | Berkeley    |  1   | 2.50  | 
+--------------------------------------------------------+

Now, the tricky part is Oakland is not present in TableC and instead it is mapped to San Francisco so any city that is mapped to Oakland needs to refer to the city that it is mapped to. In my situation, there are many situations where this occurs so I want to make sure I handle this properly.

I got as far as unpacking/normalizing TableC and joining it with Table B but not sure how to proceed. I know I need to use Group By but not sure how to do so with the caveat that I have described.

select * from
(
    select Date, CID, Name, replace(Details, 'Hour', '') as Hour, ObservationValue
    from TableC
    as Result
    unpivot
    (
      ObservationValue for Details in ([Hour1], [Hour2], [Hour3], [Hour4], [Hour5], [Hour6],
      [Hour7],[Hour8],[Hour9],[Hour10],[Hour11],[Hour12],[Hour13],[Hour14],
      [Hour15],[Hour16],[Hour17],[Hour18],[Hour19],[Hour20],[Hour21],[Hour22],
      [Hour23],[Hour24])
)
as UnPvt
) as resultX
join TableB as b on (b.City2ID = resultX.CID)

If you can't access the SQL Fiddle link below is the code to create the Schema:

CREATE TABLE TableA
    ([Id] int, [name] varchar(20));

INSERT INTO TableA
    ([Id], [name])
VALUES
    (987, 'Mississauga'),
    (454, 'Hoboken'),
    (343, 'Berkeley');

CREATE TABLE TableB
    ([City1ID] int, [City1Name] varchar(20), 
     [City2ID] int, [City2Name] varchar(20), [Factor] varchar(20))

INSERT INTO TableB
     ([City1ID], [City1Name],[City2ID], [City2Name], [Factor])
VALUES
     (343, 'Berkeley', 19, 'Oakland', 0.5),
    (987, 'Mississauga', 23, 'Toronto', 1.0),
    (66, 'Redmond', 13, 'Seattle', 1.0),
    (343, 'Berkeley', 14, 'San Jose', 0.5),
    (454, 'Hoboken', 55, 'New York City', 0.9),
    (454, 'Hoboken', 44, 'Philadephia', 0.1),
    (19, 'Oakland', 9, 'San Francisco', 1.0);

CREATE TABLE TableC
    ([date] datetime, [CId] int, [name] varchar(50), [blah1] varchar(10), 
     [blah2] varchar(10), [hour1] int, [hour2] int, [hour3] int, [hour4] int,
    [hour5] int, [hour6] int, [hour7] int, [hour8] int, [hour9] int,
    [hour10] int,[hour11] int, [hour12] int, [hour13] int, [hour14] int,
    [hour15] int, [hour16] int, [hour17] int, [hour18] int, [hour19] int,
    [hour20] int, [hour21] int, [hour22] int, [hour23] int, [hour24] int);

INSERT INTO TableC
    ([date], [CId], [name], [blah1], 
     [blah2], [hour1], [hour2], [hour3], [hour4],
    [hour5], [hour6], [hour7], [hour8], [hour9],
    [hour10],[hour11], [hour12], [hour13], [hour14],
    [hour15], [hour16], [hour17], [hour18], [hour19],
    [hour20], [hour21], [hour22], [hour23], [hour24])

VALUES
('2006-08-01 00:00:00', 9, 'San Francisco', 'Blah1', 'Blah2', 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
('2006-08-01 00:00:00', 23,'Toronto', 'Blah1', 'Blah2', 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
('2006-08-01 00:00:00', 13,'Seattle', 'Blah1', 'Blah2', 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
('2006-08-01 00:00:00', 14,'San Jose', 'Blah1', 'Blah2', 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4),
('2006-08-01 00:00:00', 55,'New York City', 'Blah1', 'Blah2', 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5),
('2006-08-01 00:00:00', 44,'Philadelphia', 'Blah1', 'Blah2', 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6);

解决方案

this will do with your data and structure

   select       date,cityid, cityname,hr, sum(hour1) as total
from  
            (
                select  c.Date as date,
                        isnull(a.id,parent.city1id) as cityid,
                        isnull(a.name,parent.city1name) as cityname,
                        isnull(parent.factor,1) *            --- parent factor
                        b.factor *  --- actual factor
                        isnull(c.hour1,0) as hour1 ,
                        1 as hr

                from    tablea a 
                right outer join tableb b on b.city1id = a.[id]
                left  outer join tableb parent on parent.city2id=b.city1id
                left  outer join tablec c on c.cid = isnull(b.city2id,b.city1id)
            )   final 
where       date is not null and cityid is not null
group by    cityid,cityname,date,hr

这篇关于用给定的数据集使用组建立结果的并发症的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆