与dplyr一起安排在一个小组内 [英] Arrange within a group with dplyr

查看:107
本文介绍了与dplyr一起安排在一个小组内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用库(nycflights13),并使用以下命令group_by month和day,选择每个组内的前3行,然后按出发延迟在每个组内按降序排序。代码如下:

 航班%>%group_by(月,日)%>%top_n(3,dep_delay )%>%排列(desc(dep_delay))

返回以下输出:

  year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay运营商航班tailnum出发地目的
< int> < INT> < INT> < INT> < INT> < DBL> < INT> < INT> < DBL> < CHR> < INT> < CHR> < CHR> < CHR>
1 2013 1 9 641 900 1301 1242 1530 1272 HA 51 N384HA JFK HNL
2 2013 6 15 1432 1935 1137 1607 2120 1127 MQ 3535 N504MQ JFK CMH
3 2013 1 10 1121 1635 1126 1239 1810 1109 MQ 3695 N517MQ EWR ORD
4 2013 9 20 1139 1845 1014 1457 2210 1007 AA 177 N338AA JFK SFO
5 2013 7 22 845 1600 1005 1044 1815 989 MQ 3075 N665MQ JFK CVG
6 2013 4 10 1100 1900 960 1342 2211 931 DL 2391 N959DL JFK TPA

记录按降序排列但不在组内。



这是为什么?应该做些什么来纠正代码?



编辑




根据评论中提出的建议,我仍然没有得到我正在寻找的内容,也就是在每个月份的日期中,按照出发延迟的顺序对排名前3的记录进行降序排序:

 航班%>>%group_by(月,日)%> top_n(3,dep_delay)%>%排列(desc(月,日,dep_delay))

年份月份日dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay运营商航班tailnum出发地目的
< int> < INT> < INT> < INT> < INT> < DBL> < INT> < INT> < DBL> < CHR> < INT> < CHR> < CHR> < CHR>
1 2013 12 1 657 1930 687 1010 2249 681 DL 1091 N342NW JFK SAT
2 2013 12 1 1504 1056 248 1628 1230 238 EV 5309 N615QX LGA BGR
3 2013 12 1 2017 1455 322 2146 1609 337 DL 1164 N6704Z JFK BOS
4 2013 12 2 1139 745 234 1358 1012 226 DL 807 N935AT EWR ATL
5 2013 12 2 1823 1345 278 2123 1640 283 UA 1510 N75861 EWR IAH
6 2013 12 2 1842 1428 254 1955 1545 250 EV 5712 N827AS JFK IAD
7 2013 12 3 920 600 200 1158 846 192 B6 583 N535JB JFK MCO
8 2013 12 3 1424 1114 190 1713 1347 206 UA 405 N437UA LGA DEN
9 2013 12 3 2300 1935 205 116 2203 193 FL 1346 N964AT LGA ATL
10 2013 12 4 1210 829 221 1440 1055 225 EV 4419 N23139 EWR XNA


解决方案

我认为第二个例子中的问题是您正在使用 desc ,所以它只适用于 month 列。

 航班%>>%group_by(月,日)%> top_n(3,dep_delay)%>%
排列(
月,
日,
desc(dep_delay)


来源:当地数据框[1,108 x 19]
组:月,日[ 365]

year year day day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay运营商航班tailnum出发地
< int> < INT> < INT> < INT> < INT> < DBL> < INT> < INT> < DBL> < CHR> < INT> < CHR> < CHR>
1 2013 1 1 848 1835 853 1001 1950 851 MQ 3944 N942MQ JFK
2 2013 1 1 2343 1724 379 314 1938 456 EV 4321 N21197 EWR
3 2013 1 1 1815 1325 290 2120 1542 338 EV 4417 N17185 EWR
4 2013 1 2 2131 1512 379 2340 1741 359 UA 488 N593UA LGA
5 2013 1 2 1607 1030 337 2003 1355 368 AA 179 N324AA JFK
2013 1 2 1412 838 334 1710 1147 323 UA 468 N474UA EWR
7 2013 1 3 2056 1605 291 2239 1754 285 9E 3459 N928XJ JFK
8 2013 1 3 2008 1540 268 2339 1909 270 DL 2027 N338NW JFK
9 2013 1 3 2012 1600 252 2314 1857 257 B6 369 N558JB LGA
10 2013 1 4 2123 1635 288 2332 1856 276 EV 3805 N29917 EWR
#...有1,098多行,还有6个变量dest < chr>,air_time< dbl>,distance< dbl>,hour< dbl>,minute< dbl>,
#time_hour< dttm>


I am using the library(nycflights13) and I use the following command to group_by month and day, select the top 3 rows within each group and then sort in descending order within each group by departure delay. The code is the following:

flights %>% group_by(month, day)  %>% top_n(3, dep_delay) %>% arrange(desc(dep_delay))

Which returns the following output:

    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin  dest
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>  <chr> <chr>
1   2013     1     9      641            900      1301     1242           1530      1272      HA     51  N384HA    JFK   HNL
2   2013     6    15     1432           1935      1137     1607           2120      1127      MQ   3535  N504MQ    JFK   CMH
3   2013     1    10     1121           1635      1126     1239           1810      1109      MQ   3695  N517MQ    EWR   ORD
4   2013     9    20     1139           1845      1014     1457           2210      1007      AA    177  N338AA    JFK   SFO
5   2013     7    22      845           1600      1005     1044           1815       989      MQ   3075  N665MQ    JFK   CVG
6   2013     4    10     1100           1900       960     1342           2211       931      DL   2391  N959DL    JFK   TPA

The records are sorted in descending order but not within groups.

Why is that? What should be done to correct the code? Your advice will be appreciated.

#

Edit

#

Following the suggestions made in the comments I still don't get what I am looking for, i.e. within each month-day grouping sorting of the top 3 records in descending order in terms of the departure delay:

flights %>% group_by(month, day)  %>% top_n(3, dep_delay) %>% arrange(desc(month, day,  dep_delay))

   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin  dest
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>  <chr> <chr>
1   2013    12     1      657           1930       687     1010           2249       681      DL   1091  N342NW    JFK   SAT
2   2013    12     1     1504           1056       248     1628           1230       238      EV   5309  N615QX    LGA   BGR
3   2013    12     1     2017           1455       322     2146           1609       337      DL   1164  N6704Z    JFK   BOS
4   2013    12     2     1139            745       234     1358           1012       226      DL    807  N935AT    EWR   ATL
5   2013    12     2     1823           1345       278     2123           1640       283      UA   1510  N75861    EWR   IAH
6   2013    12     2     1842           1428       254     1955           1545       250      EV   5712  N827AS    JFK   IAD
7   2013    12     3      920            600       200     1158            846       192      B6    583  N535JB    JFK   MCO
8   2013    12     3     1424           1114       190     1713           1347       206      UA    405  N437UA    LGA   DEN
9   2013    12     3     2300           1935       205      116           2203       193      FL   1346  N964AT    LGA   ATL
10  2013    12     4     1210            829       221     1440           1055       225      EV   4419  N23139    EWR   XNA

解决方案

I think the problem in your second example is that your are using desc on all the variables at the same time, so it is only applied to the month column.

   flights %>% group_by(month, day)  %>% top_n(3, dep_delay) %>% 
        arrange(
            month, 
            day,  
            desc(dep_delay)
        )

Source: local data frame [1,108 x 19]
Groups: month, day [365]

    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>  <chr>
1   2013     1     1      848           1835       853     1001           1950       851      MQ   3944  N942MQ    JFK
2   2013     1     1     2343           1724       379      314           1938       456      EV   4321  N21197    EWR
3   2013     1     1     1815           1325       290     2120           1542       338      EV   4417  N17185    EWR
4   2013     1     2     2131           1512       379     2340           1741       359      UA    488  N593UA    LGA
5   2013     1     2     1607           1030       337     2003           1355       368      AA    179  N324AA    JFK
6   2013     1     2     1412            838       334     1710           1147       323      UA    468  N474UA    EWR
7   2013     1     3     2056           1605       291     2239           1754       285      9E   3459  N928XJ    JFK
8   2013     1     3     2008           1540       268     2339           1909       270      DL   2027  N338NW    JFK
9   2013     1     3     2012           1600       252     2314           1857       257      B6    369  N558JB    LGA
10  2013     1     4     2123           1635       288     2332           1856       276      EV   3805  N29917    EWR
# ... with 1,098 more rows, and 6 more variables: dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
#   time_hour <dttm>

这篇关于与dplyr一起安排在一个小组内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆