当存在大量类别/类型时,R 使用 case_when(R 向量化)应用多个函数 [英] R apply multiple functions when large number of categories/types are present using case_when (R vectorization)

查看:31
本文介绍了当存在大量类别/类型时,R 使用 case_when(R 向量化)应用多个函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个如下形式的数据集:

Suppose I have a dataset of the following form:

City=c(1,2,2,1)
Business=c(2,1,1,2)
ExpectedRevenue=c(35,20,15,19)
zz=data.frame(City,Business,ExpectedRevenue)
zz_new=do.call("rbind", replicate(zz, n=30, simplify = FALSE))

我的实际数据集包含大约 20 万行.此外,它还包含 100 多个城市的信息.假设,对于每个城市(我也称之为类型"),我需要应用以下函数:

My actual dataset contains about 200K rows. Furthermore, it contains information for over 100 cities. Suppose, for each city (which I also call "Type"), I have the following functions which need to be applied:

#Writing the custom functions for the categories here

Type1=function(full_data,observation){
  NewSet=full_data[which(!full_data$City==observation$City),]
  BusinessMax = max(NewSet$ExpectedRevenue)+10*rnorm(1)
  return(BusinessMax)
}

Type2=function(full_data,observation){
  NewSet=full_data[which(!full_data$City==observation$City),]
  BusinessMax = max(NewSet$ExpectedRevenue)-100*rnorm(1)
  return(BusinessMax)
}

再一次,以上两个函数是我用来说明的极其简单的函数.这里的想法是,对于每个城市(或类型"),我需要为数据集中的每一行运行不同的函数.在上面的两个函数中,我使用了 rnorm 来检查并确保我们为每一行绘制不同的值.

Once again the above two functions are extremely simply ones that I use for illustration. The idea here is that for each City (or "Type") I need to run a different function for each row in my dataset. In the above two functions, I used rnorm in order to check and make sure that we are drawing different values for each row.

现在对于整个数据集,我想首先将观察划分为不同的城市(或类型").我可以使用 (zz_new[["City"]]==1) [也见下文]来做到这一点.然后为每个类运行各自的函数.但是,当我运行下面的代码时,我得到 -Inf.

Now for the entire dataset, I want to first divide the observation into its different City (or "Types"). I can do this using (zz_new[["City"]]==1) [also see below]. And then run the respective functions for each classes. However, when I run the code below, I get -Inf.

有人能帮我理解为什么会这样吗?

Can someone help me understand why this is happening?

对于示例数据,我希望获得 20 加 10 倍的一些随机值(对于类型 = 1)和 35 减去 100 倍的一些随机值(对于类型 = 2).每行的值也应该不同,因为我是从随机正态分布中绘制的.

For the example data, I would expect to obtain 20 plus 10 times some random value (for Type =1) and 35 minus 100 times some random value (for Type=2). The values should also be different for each row since I am drawing them from a random normal distribution.

library(dplyr) #I use dplyr here
zz_new[,"AdjustedRevenue"] = case_when(
  zz_new[["City"]]==1~Type1(full_data=zz_new,observation=zz_new[,]),
  zz_new[["City"]]==2~Type2(full_data=zz_new,observation=zz_new[,])
)

非常感谢.

推荐答案

让我们来看看你的代码.我重写了你的代码

Let's take a look at your code. I rewrite your code

library(dplyr)
zz_new[,"AdjustedRevenue"] = case_when(
  zz_new[["City"]]==1~Type1(full_data=zz_new,observation=zz_new[,]),
  zz_new[["City"]]==2~Type2(full_data=zz_new,observation=zz_new[,])
)

zz_new %>%
  mutate(AdjustedRevenue = case_when(City == 1 ~ Type1(zz_new,zz_new),
                                     City == 2 ~ Type2(zz_new,zz_new)))

因为您正在使用 dplyr 但不要使用此包提供的强大工具.

since you are using dplyr but don't use the powerful tools provided by this package.

除了mutate 的使用之外,一个关键的变化是我用zz_new 替换了zz_new[,].现在我们看到 Type 函数的两个参数是相同的数据帧.

Besides the usage of mutate one key change is that I replaced zz_new[,] with zz_new. Now we see that both arguments of your Type-functions are the same dataframe.

下一步:看看你的函数

Type1 <- function(full_data,observation){
  NewSet=full_data[which(!full_data$City==observation$City),]
  BusinessMax = max(NewSet$ExpectedRevenue)+10*rnorm(1)
  return(BusinessMax)
}

Type1(zz_new,zz_new) 调用.所以NewSet的定义给了我们

which is called by Type1(zz_new,zz_new). So the definition of NewSet gives us

NewSet=full_data[which(!full_data$City==observation$City),]

# replace the arguments
NewSet <- zz_new[which(!zz_new$City==zz_new$City),]

因此 NewSet 总是一个零行的数据框.将 max 应用于 data.frame 的空列会产生 -Inf.

Thus NewSet is always a dataframe with zero rows. Applying max to an empty column of a data.frame yields -Inf.

这篇关于当存在大量类别/类型时,R 使用 case_when(R 向量化)应用多个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆