dplyrrang()函数按缺失值排序 [英] dplyr arrange() function sort by missing values
问题描述
我正在尝试研究Hadley Wickham的Data Science R,并绊倒了以下问题:您如何使用ranging()将所有缺少的值从头开始排序?(提示:use is.na())" 我正在使用 nycflights13 包中包含的 flights 数据集.鉴于ranging()将所有未知值排序到数据帧的底部,我不确定如何对所有变量的缺失值做相反的处理.我意识到可以使用基本的R代码来回答这个问题,但是我对使用dplyr以及对ranging()和is.na()函数的调用将如何实现特别感兴趣.谢谢.
I am attempting to work through Hadley Wickham's R for Data Science and have gotten tripped up on the following question: "How could you use arrange() to sort all missing values to the start? (Hint: use is.na())" I am using the flights dataset included in the nycflights13 package. Given that arrange() sorts all unknown values to the bottom of the dataframe, I am not sure how one would do the opposite across the missing values of all variables. I realize that this question can be answered with base R code, but I am specifically interested in how this would be done using dplyr and a call to the arrange() and is.na() functions. Thanks.
推荐答案
我们可以用 desc
将其包装起来,以便在开始时获取缺失的值
We can wrap it with desc
to get the missing values at the start
flights %>%
arrange(desc(is.na(dep_time)),
desc(is.na(dep_delay)),
desc(is.na(arr_time)),
desc(is.na(arr_delay)),
desc(is.na(tailnum)),
desc(is.na(air_time)))
NA值仅在基于
names(flights)[colSums(is.na(flights)) >0]
#[1] "dep_time" "dep_delay" "arr_time" "arr_delay" "tailnum" "air_time"
我们也可以一次使用NSE arrange _
nm1 <- paste0("desc(is.na(", names(flights)[colSums(is.na(flights)) >0], "))")
r1 <- flights %>%
arrange_(.dots = nm1)
r1 %>%
head()
#year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
# <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr>
#1 2013 1 2 NA 1545 NA NA 1910 NA AA 133 <NA>
#2 2013 1 2 NA 1601 NA NA 1735 NA UA 623 <NA>
#3 2013 1 3 NA 857 NA NA 1209 NA UA 714 <NA>
#4 2013 1 3 NA 645 NA NA 952 NA UA 719 <NA>
#5 2013 1 4 NA 845 NA NA 1015 NA 9E 3405 <NA>
#6 2013 1 4 NA 1830 NA NA 2044 NA 9E 3716 <NA>
#Variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
# time_hour <time>.
更新
使用更新版本的tidyverse( dplyr_0.7.3
, rlang_0.1.2
),我们还可以使用 arrange_at
, arrange_all
, arrange_if
Update
With the newer versions of tidyverse (dplyr_0.7.3
, rlang_0.1.2
) , we can also make use of arrange_at
, arrange_all
, arrange_if
nm1 <- names(flights)[colSums(is.na(flights)) >0]
r2 <- flights %>%
arrange_at(vars(nm1), funs(desc(is.na(.))))
或使用 arrange_if
f <- rlang::as_function(~ any(is.na(.)))
r3 <- flights %>%
arrange_if(f, funs(desc(is.na(.))))
identical(r1, r2)
#[1] TRUE
identical(r1, r3)
#[1] TRUE
这篇关于dplyrrang()函数按缺失值排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!