为什么sum()在这个dplyr表达式中工作,而quantile()不是? [英] Why are sum() working in this dplyr expression while quantile() isn't?

查看:318
本文介绍了为什么sum()在这个dplyr表达式中工作,而quantile()不是?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算数据帧的每一行的分位数,并将结果作为矩阵返回。由于我想要计算和任意数量的分位数(我想像是一次快速计算它们,而不是重新运行该函数),我尝试使用我在这个问题

  library(dplyr)
df< - as.data.frame(matrix(rbinom(1000,10,0.5),nrow = 2))

interim_res< - df%>%
rowwise %>%
do(out = sapply(min(df))max(df),function(i)sum(i ==。))

interim_res< - interim_res [[1]]%>%do.call(rbind ,.)%>%as.data.frame(。)

这是有道理的,但是当我尝试将相同的框架应用于 quantile()函数时,如下所示,

  interim_res<  -  df%>%
rowwise()%>%
do(out = quantile ,probs = c(0.1,0.5,0.9)))

interim_res < - interim_res [[1]]%>%do.call(rbind, 。)%>%as.data.frame(。)

我收到此错误消息: / p>


sort.int中的错误(x,na.last = na.last,decrease = decre,...):



'x'必须是原子


为什么我收到错误 quantile 而不是 sum ?如何解决这个问题?

解决方案

in do 是一个数据框,这就是为什么你得到错误。这样做:

  df%>%
rowwise()%>%
do .frame(as.list(quantile(unlist(。),probs = c(0.1,0.5,0.9)))))

但风险缓慢。为什么不只是:

  apply(df,1,quantile,probs = c(0.1,0.5,0.9))

以下是一些更大数据的时间:

  df<  -  as.data.frame(matrix(rbinom(100000,10,0.5),nrow = 1000))

库(微基准)
microbenchmark(
df%>%rowwise()%>%do(data.frame(as.list(quantile(unlist(。),probs = c(0.1,0.5,0.9)))
申请(df,1,quantile,probs = c(0.1,0.5,0.9)),
times = 5

产生:

  min lq mean median uq max neval 
dplyr 2375.2319 2376.6658 2446.4070 2419.4561 2454.6017 2606.0794 5
申请224.7869 231.7193 246.7137 233.4757 245.0718 298.5144 5

如果你去申请路线,你应该可以坚持使用矩阵。


I want to calculate the quantiles of each row of a data frame and return the result as a matrix. Since I want to calculate and arbitrary number of quantiles (and I imagine that it is faster to calculate them all at once, rather than re-running the function), I tried using a formula I found in this question:

library(dplyr)
df<- as.data.frame(matrix(rbinom(1000,10,0.5),nrow = 2))

interim_res <- df %>% 
              rowwise() %>% 
              do(out = sapply(min(df):max(df), function(i) sum(i==.)))

interim_res <- interim_res[[1]] %>% do.call(rbind,.) %>% as.data.frame(.)

This makes sense, but when I try to apply the same framework to the quantile() function, as coded here,

interim_res <- df %>% 
              rowwise() %>% 
              do(out = quantile(.,probs = c(0.1,0.5,0.9)))

interim_res <- interim_res[[1]] %>% do.call(rbind,.) %>% as.data.frame(.)

I get this error message:

Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :

'x' must be atomic

Why am I getting an error with quantile and not sum? How should I fix this issue?

解决方案

. in do is a data frame, which is why you get the error. This works:

df %>% 
  rowwise() %>% 
  do(data.frame(as.list(quantile(unlist(.),probs = c(0.1,0.5,0.9)))))

but risks being horrendously slow. Why not just:

apply(df, 1, quantile, probs = c(0.1,0.5,0.9))

Here are some timings with larger data:

df <- as.data.frame(matrix(rbinom(100000,10,0.5),nrow = 1000))

library(microbenchmark)
microbenchmark(
  df %>% rowwise() %>% do(data.frame(as.list(quantile(unlist(.),probs = c(0.1,0.5,0.9))))),
  apply(df, 1, quantile, probs = c(0.1,0.5,0.9)),
  times=5
) 

Produces:

            min        lq      mean    median        uq       max neval
dplyr 2375.2319 2376.6658 2446.4070 2419.4561 2454.6017 2606.0794     5
apply  224.7869  231.7193  246.7137  233.4757  245.0718  298.5144     5    

If you go the apply route you should probably stick with a matrix from the get go.

这篇关于为什么sum()在这个dplyr表达式中工作,而quantile()不是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆