Postgres where子句超过两列 [英] Postgres where clause over two columns

查看:131
本文介绍了Postgres where子句超过两列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据库-我正在使用Postgres 9.6.5



我正在分析来自美国机场管理局(RITA)的有关航班起降的数据。
此链接( http://stat-computing.org/dataexpo /2009/the-data.html )列出了表格中的所有列。



表具有以下29列


无名称说明



1年1987-2008



2个月1-12



3 DayofMonth 1-31



4 DayOfWeek 1(星期一)-7(星期日)



5实际的DepTime出发时间(本地,hhmm)



6 CRSDepTime计划的出发时间(本地,hhmm)



7实际ArrTime到达时间(本地,hhmm)



8 CRSArr时间计划到达时间(本地,hhmm)



9 UniqueCarrier unique航空公司代码



10 FlightNum航班号



11 TailNum飞机机尾号



12分钟中的ActualElapsedTime



13 CRSElapsedTime中的分钟



14 AirTime中的分钟



15 ArrDelay到达延迟,以分钟为单位tes



16 DepDelay出发延误,以分钟为单位



17始发地IATA机场代码



18目的地国际航空运输协会机场代码



19英里距离



20出租车以分钟为单位的滑行时间



21出租车以分钟为单位的滑行时间



22取消了航班?



23 CancellationCode取消原因(A =运营商,B =天气,C = NAS,D =安全)



24已转移1 =是,0 =否



25分钟内的CarrierDelay



26分钟内的WeatherDelay



27分钟内NASDelay



28分钟内SecurityDelay



29 LateAircraftDelay in minutes


每年大约有一百万行。



我试图找出延迟超过15分钟时最繁忙的机场。
列DepDelay-具有延迟时间。
起点-是机场的起点代码。



所有数据均已加载到名为 ontime的表中



我正在按以下方式形成查询分阶段进行。


  1. 选择延迟时间超过15分钟的机场



    select origin,year,count(*)as depdelay_count from ontime
    其中
    depdelay> 15

    按年份分组,原始
    由depdelay_count desc
    排序) / p>


  2. 现在,我希望每年仅抽出前十名机场-我正在按照以下步骤进行操作



    从x中选择x.origin,x.year(子查询为(
    select origin,year,count(*)作为depdelay_count from ontime
    其中
    depdelay> 15
    group按年份,由$ debdelay_count desc
    产生的原始
    订单
    选择(根据子查询的排名,按depdelay_count desc按年顺序划分)origin,year,rank()作为x的x.rank< ; = 10;


  3. 现在,我拥有depdelay排名前10位的机场-我希望对这些机场的总航班数进行统计。



    从准时选择起点,count(),其中起点在
    中(选择x.origin于(子查询为(
    选择起点,year,count(
    )as depdelay_count from ontime
    where
    depdelay> 15
    group by year,起源的
    by depdelay_count desc

    选择origin,year,rank()作为子查询的排名(按depdelay_count desc按年级划分),其中x.rank< = 2)
    按原点分组
    按原点排序;


如果我通过在year子句中添加年份来修改第3步查询



----将是(1987年至2008年)的任何值。

 选择来源,计数( *)from ontime,其中year =(< YEAR>)起始于
(选择x.origin from为(子查询为(
select origin,year,count(*))为ontime $ b $的depdelay_count b其中
的延迟> 15
按年分组,起源
由depdelay_count desc
排序)
选择起源,年,rank()超过(由depdelay_count desc按年级划分)作为子查询的排名)x其中x.rank< = 2)
按产地分组
按产地排序;

但是我要手动避免从1987年到2008年的所有时间里这样做。



请帮助您优化查询,以便我可以选择所有年份的数据,而不必手动选择每年。

解决方案

我发现查询中间的CTE令人困惑。您基本上可以使用一个CTE /子查询来做到这一点:

 ,而oy为(
选择来源,年份,计数( *)作为numflights,
sum((depdelay> 15):: int)作为depdelay_count,
row_number()over(按年顺序划分sum((depdelay> 15):: int) desc)作为按时间顺序从始发地
按始发地分组的seqnum
,年份

选择oy。*
从oy
中,其中seqnum< = 10;

请注意使用条件聚合以及将窗口函数与聚合函数一起使用。


Database - I am working on in Postgres 9.6.5

I am analyzing the data from US Airport Authority (RITA) about the flights arrival and departures. This link (http://stat-computing.org/dataexpo/2009/the-data.html) lists all the columns in the table.

The table has following 29 columns

No Name Description

1 Year 1987-2008

2 Month 1-12

3 DayofMonth 1-31

4 DayOfWeek 1 (Monday) - 7 (Sunday)

5 DepTime actual departure time (local, hhmm)

6 CRSDepTime scheduled departure time (local, hhmm)

7 ArrTime actual arrival time (local, hhmm)

8 CRSArrTime scheduled arrival time (local, hhmm)

9 UniqueCarrier unique carrier code

10 FlightNum flight number

11 TailNum plane tail number

12 ActualElapsedTime in minutes

13 CRSElapsedTime in minutes

14 AirTime in minutes

15 ArrDelay arrival delay, in minutes

16 DepDelay departure delay, in minutes

17 Origin origin IATA airport code

18 Dest destination IATA airport code

19 Distance in miles

20 TaxiIn taxi in time, in minutes

21 TaxiOut taxi out time in minutes

22 Cancelled was the flight cancelled?

23 CancellationCode reason for cancellation (A = carrier, B = weather, C = NAS, D = security)

24 Diverted 1 = yes, 0 = no

25 CarrierDelay in minutes

26 WeatherDelay in minutes

27 NASDelay in minutes

28 SecurityDelay in minutes

29 LateAircraftDelay in minutes

There are about a million rows for each year.

I am trying to find out a count the most busy airports when delay is more than 15minutes. column DepDelay - has the delay time. origin - is the origin code for the airport.

All the data has been loaded into a table called 'ontime'

I am forming the query as follows in stages.

  1. select airports where delay is more than 15 minutes

    select origin,year,count(*) as depdelay_count from ontime where depdelay > 15
    group by year,origin order by depdelay_count desc )

  2. Now I wish to pull out only the top 10 airports per year - which I am doing as follows

    select x.origin,x.year from (with subquery as ( select origin,year,count(*) as depdelay_count from ontime where depdelay > 15 group by year,origin order by depdelay_count desc ) select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 10;

  3. Now that I have the top 10 airports by depdelay - I wish to get a count of the total flights out of these airports.

    select origin,count() from ontime where origin in (select x.origin from (with subquery as ( select origin,year,count() as depdelay_count from ontime where depdelay > 15 group by year,origin order by depdelay_count desc ) select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 2) group by origin order by origin;

If I modify the Step 3 query by adding the year in the year clause

---- will be any value from (1987 to 2008)

select origin,count(*) from ontime where year = (<YEAR>) origin in  
(select x.origin from (with subquery as (
    select origin,year,count(*) as depdelay_count from ontime 
    where 
    depdelay > 15
    group by year,origin 
    order by depdelay_count desc 
    )
    select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 2)
    group by origin
    order by origin;

But I have to do this manually for all years from 1987 to 2008 which I want to avoid.

Please can you help refine the query so that I can select the data for all the years without having to select each year manually.

解决方案

I find CTEs int he middle of queries to e confusing. You can basically do this with one CTE/subquery:

with oy as (
      select origin, year, count(*) as numflights,
             sum( (depdelay > 15)::int ) as depdelay_count,
             row_number() over (partition by year order by sum( (depdelay > 15)::int ) desc) as seqnum
      from ontime
      group by origin, year
     ) 
select oy.*
from oy
where seqnum <= 10;

Note the use of conditional aggregation and using window functions with aggregation functions.

这篇关于Postgres where子句超过两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆