R如何在多个条件下对向量进行向量化 [英] R how to vectorize a function with multiple if else conditions

查看:69
本文介绍了R如何在多个条件下对向量进行向量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R语言中向量化功能的新手.我有类似以下的代码.

Hi I am new to vectorizing functions in R. I have a code similar the following.

library(truncnorm)
library(microbenchmark)

num_obs=10000
Observation=seq(1,num_obs)
Obs_Type=sample(1:4, num_obs, replace=T)
Upper_bound = runif(num_obs,0,1)
Lower_bound=runif(num_obs,2,4)
mean = runif(num_obs,10,15)

df1= data.frame(Observation,Obs_Type,Upper_bound,Lower_bound,mean)
df1$draw_value = 0

Trial_func=function(df1){
  for (i in 1:nrow(df1)){
    if (df1[i,"Obs_Type"] ==1){
      #If Type == 1; then a=-Inf, b = Upper_Bound
      df1[i,"draw_value"] = rtruncnorm(1,a=-Inf,b=df1[i,"Upper_bound"],mean= df1[i,"mean"],sd=1)
    } else if (df1[i,"Obs_Type"] ==2){
      #If Type == 2; then a=-10, b = Upper_Bound
      df1[i,"draw_value"] = rtruncnorm(1,a=-10,b=df1[i,"Upper_bound"],mean= df1[i,"mean"],sd=1)
    } else if(df1[i,"Obs_Type"] ==3){
      #If Type == 3; then a=Lower_bound, b = Inf
      df1[i,"draw_value"] = rtruncnorm(1,a=df1[i,"Lower_bound"],b=Inf,mean= df1[i,"mean"],sd=1)
    } else {
      #If Type == 3; then a=Lower_bound, b = 10
      df1[i,"draw_value"] = rtruncnorm(1,a=df1[i,"Lower_bound"],b=10,mean= df1[i,"mean"],sd=1)
    }
  }
  return(df1)
}

#Benchmarking
mbm=microbenchmark(Trial_func(df1=df1),times = 10)
summary(mbm)
#For obtaining the new data
New_data=Trial_func(df1=df1)

在上面,我最初创建了一个名为df1的数据框.然后,我创建一个接受数据集(df1)的函数.数据集中的每个观测值(df1)可以是四种类型之一.这由df1 $ Obs_Type给出.我想做的是基于Obs_Type,我想从具有给定的上下点的截断正态分布中绘制值.

In the above I am creating a dataframe called df1 initially. I then create a function which takes a dataset (df1). Each observation in the dataset (df1), can be one of four types. This is given by df1$Obs_Type. What I want to do is that based on the Obs_Type, I want to draw values from a truncated normal distribution with a given upper and lower points.

规则是:

a)当Obs_Type = 1时;a = -Inf,b =观测值i的上限.

a) When Obs_Type =1; a=-Inf, b = Upper_bound value of observation i.

b)当Obs_Type = 2时;a = -10,b =观测值i的上限.

b) When Obs_Type =2; a=-10, b = Upper_bound value of observation i.

c)当Obs_Type = 3时;a =观测值i的上限,b = Inf.

c) When Obs_Type =3; a=Upper_bound value of observation i, b = Inf.

d)当Obs_Type = 4时;a =观测值i的上限,b = 10.

d) When Obs_Type =4; a=Upper_bound value of observation i, b = 10.

其中a =下限,b =上限;另外,观测平均值i由df1 $ mean和sd = 1给出.

Where a = lower bound, b = upper bound; Additionally, mean of observation i is given by df1$mean and sd = 1.

我对向量化并不熟悉,想知道是否有人可以帮助我.我尝试查看SO上的其他示例(例如,),但是当我有多个条件时却不知道该怎么办.

I am not familiar with vectorizing and was wondering if someone could help me with this a bit. I tried looking at some other examples on SO (for eg. this) but could not figure out what to do when I have multiple conditions.

我的原始数据集有大约一千万个观测值和其他附加条件(例如,我的数据不是16种类型,而是4种类型,而每种类型的均值都在变化),但是我在这里使用了一个简单的示例.

My original dataset has about 10 million observations and other additional conditions (eg. instead of 4 types, my data has 16 types and the means changes with each type), but I used a simpler example here.

请让我知道问题的任何部分是否需要任何其他说明.

Please let me know if any part of the question requires any additional clarification.

推荐答案

这里是矢量化方法.它创建对应于4个条件的逻辑向量 i1 i2 i3 i4 .然后,它将新值分配给它们所索引的位置.

Here is a vectorized way. It creates logical vectors i1, i2, i3 and i4 corresponding to the 4 conditions. Then it assigns the new values to the positions indexed by them.

Trial_func2 <- function(df1){
  i1 <- df1[["Obs_Type"]] == 1
  i2 <- df1[["Obs_Type"]] == 2
  i3 <- df1[["Obs_Type"]] == 3
  i4 <- df1[["Obs_Type"]] == 4

  #If Type == 1; then a=-Inf, b = Upper_Bound
  df1[i1, "draw_value"] <- rtruncnorm(sum(i1), a =-Inf, 
                                      b = df1[i1, "Upper_bound"], 
                                      mean = df1[i1, "mean"], sd = 1)
  #If Type == 2; then a=-10, b = Upper_Bound
  df1[i2, "draw_value"] <- rtruncnorm(sum(i2), a = -10,
                                      b = df1[i2 , "Upper_bound"],
                                      mean = df1[i2, "mean"], sd = 1)
  #If Type == 3; then a=Lower_bound, b = Inf
  df1[i3,"draw_value"] <- rtruncnorm(sum(i3), 
                                     a = df1[i3, "Lower_bound"],
                                     b = Inf, mean = df1[i3, "mean"], 
                                     sd = 1)
  #If Type == 3; then a=Lower_bound, b = 10
  df1[i4, "draw_value"] <- rtruncnorm(sum(i4), 
                                      a = df1[i4, "Lower_bound"],
                                      b = 10,
                                      mean = df1[i4,"mean"],
                                      sd = 1)
  df1
}

在速度测试中,我已将 @ Dave2e的答案命名为 Trial_func3 .

In the speed test I have named @Dave2e's answer Trial_func3.

mbm <- microbenchmark(
  loop = Trial_func(df1 = df1),
  vect = Trial_func2(df1 = df1),
  cwhen = Trial_func3(df1 = df1),
  times = 10)

print(mbm, order = "median")
#Unit: milliseconds
#  expr         min          lq       mean      median          uq         max neval cld
#  vect    4.349444    4.371169    4.40920    4.401384    4.450024    4.487453    10  a 
# cwhen   13.458946   13.484247   14.16045   13.528792   13.787951   19.363104    10  a 
#  loop 2125.665690 2138.792497 2211.20887 2157.185408 2201.391083 2453.658767    10   b

这篇关于R如何在多个条件下对向量进行向量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆