使用CUT和四分位数在R函数中生成中断 [英] Using CUT and Quartile to generate breaks in R function

查看:128
本文介绍了使用CUT和四分位数在R函数中生成中断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照一些以前的好建议,我现在正在写我的2nd R函数并使用类似的逻辑。但是,我正在尝试实现更多自动化,对于我自己来说可能变得太聪明了。

Following some great advice from before, I'm now writing my 2nd R function and using a similar logic. However, I'm trying to automate a bit more and may be getting too smart for my own good.

我想根据订单数量将客户分成五等分。这是我的代码:

I want to break the clients into quintiles based on the number of orders. Here's my code to do so:

# sample data
clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)

df <- df <- data.frame(cbind(clientID,orders))

#function to break them into quintiles
ApplyQuintiles <- function(x) {
  cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))), 
      labels=c("0-20","20-40","40-60","60-80","80-100"))
}

#Add the quintile to the dataframe
df$Quintile <- sapply(df$orders, ApplyQuintiles)

表(df $ Quintile)

0-20   20-40   40-60    60-80   80-100 
40     39      44       38      36

您会在这里看到,在我的样本数据中,我创建了200个观测值,但通过<$ c $仅列出了197个观测值c>表格。剩下的3个是 NA

You'll see here that in my sample data, I created 200 observations, yet only 197 are listed via table. The 3 left off are NA

现在,有些clientID的五分位数为 NA。看来如果它们处于最低中断位置(在这种情况下为1),则它们不包含在cut函数中。

Now, there are some clientIDs that have an 'NA' for quintile. It seems if they were at the lowest break, in this case, 1, then they were not included in the cut function.

是否有一种方法可以使 cut 包含所有观察值?

Is there a way to make cut inclusive of all observations?

推荐答案

尝试以下操作:

set.seed(700)

clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)

df <- df <- data.frame(cbind(clientID,orders))

ApplyQuintiles <- function(x) {
  cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))), 
      labels=c("0-20","20-40","40-60","60-80","80-100"), include.lowest=TRUE)
}
df$Quintile <- sapply(df$orders, ApplyQuintiles)
table(df$Quintile)

0-20  20-40  40-60  60-80 80-100 
  40     41     39     40     40 

我在其中包括 include.lowest = TRUE 您的剪切功能,似乎使其起作用。有关更多详细信息,请参见?cut

I included include.lowest=TRUE in your cut function, which seems to make it work. See ?cut for more details.

这篇关于使用CUT和四分位数在R函数中生成中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆