在嵌套循环中进行索引 [英] Indexing in nested loops

查看:178
本文介绍了在嵌套循环中进行索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R和这个网站的新手。我的目标是确保不必要的神秘代码是创建一个R函数,在ggplot2中产生一个特殊类型的盒子图。我首先需要通过计算稍后希望绘制的变量来处理潜在的输入。



我首先生成一些随机数据,称为 datos

  c1 = rnorm(98,47,23)
c2 = rnorm(98,56,13)
c3 = rnorm(98,52,7)
fila1 = as.matrix(t(c(-2,15,30)))
colnames (fila1)= c(c1,c2,c3)
fila2 = as.matrix(t(c(-20,5,20)))
colnames(fila2)= c(c1,c2,c3)
datos = rbind(data.frame(c1,c2,c3),fila1,fila2)
rm(c1,c2,c3,fila1 ,fila2)

然后,我计算变量以便稍后绘制,其中包括每个礼物第一个和第三个四分位数( cuar1 )中的列 datos 均值( puntoMedio code>, cuar3 ),内四分位范围( iqr ),潜在子平均的下界胡须( limInf ),潜在超晶须的上限( limSup )和异常值(submean o utliers vAtInf 和supermean离群值 vAtSup 要合并到 vAt ):
$ b $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $' MARGIN = 2,FUN = quantile,probs = .25)
cuar3 = apply(datos,MARGIN = 2,FUN = quantile,probs = .75)
cuar = rbind(cuar1,cuar3)
iqr = apply(cuar,MARGIN = 2,FUN = diff)
cuar = rbind(cuar,iqr,puntoMedio)
limInf = array(dim = ncol(datos))
(1):ncol(datos)){
limInf0 = as.matrix(t(cuar [1,] - 1.5 * cuar [3,]))
if(length(datos [datos [ ,i]< limInf0 [,i],i])> 0){
limInf [i] = limInf0 [,i]
} else {limInf [i] = min(datos [ ($ i

$ b $ limSup = array(dim = ncol(datos))
for(i in 1:ncol(datos)){
limSup0 = as.matrix t(cuar [2,] + 1.5 * cuar [3,]))
if(length(datos [datos [,i]> limSup0 [,i],i])> 0){
limSup [i] = limSup0 [,i]
} else {limSup [i] = max(datos [,i])}
}
d = data.frame(t (cuar,limInf,limSup)))
rm(cuar)
vAtInf = dato (i in 1:ncol(vAtInf)){
vAtInf [vAtInf [,i]> limInf0 [,i],i] = NA
}
colnames (vAtInf)= c(vAtInfc1,vAtInfc2,vAtInfc3)
vAtSup = datos
(iIn 1:ncol(vAtSup)){
vAtSup [vAtSup [ (b),(b),(b),(b),(b)和(d) dat,vAtInf,vAtSup)
rm(limInf0,limSup0,cuar1,cuar3,i,iqr,limInf,limSup,puntoMedio)

一切正常,直到这里。我有两个数据框 d datos ,前者在这里不感兴趣,后者在这个特定的案例包括九列:三个值,三个相应的子范围异常值和三个相应的超平均异常值(后六个用NA填充)。我现在希望按列来提取所有异常值,所以我试图制定下面的循环。虽然它的工作既没有提供错误,也没有发出警告,但它也不会在 vAt 中给出所需的输出(再次,从 DATOS )。这个问题,据我所知,发生在嵌套的for循环,试图输入 i vAt :循环的每一次迭代都会擦除最后一个,这样当完成整个循环时, vAt 只包含NA,最后一列/ ((ncol(datos)/ 3)+ 1):ncol(datos) ){
vAt = matrix(nrow = .25 * nrow(datos),ncol = ncol(datos) - (ncol(datos)/ 3))
colnames(vAt)= c(((ncol (datos)/ 3)+1):ncol(datos))
if(length(datos [,i] [is.na(datos [,i])== F])> (1)(长度(datos [,i] [is.na(datos [,i])== F]))){
nom = as.character(i)
vAt [j,nom] = datos [,i] [is.na(datos [,i])== F] [j]
}
} else {next}
}

我一直无法找到任何存在的线程回答我的问题。感谢您的帮助。

解决方案

问题是您正在初始化 vAt 这里的循环里面。
循环之外移动循环的初始化语句可以解决您面临的问题:

<$ p $ (数据),ncol = ncol(datos) - (ncol(datos)/ 3))
colnames(vAt)= c(( ((ncol(datos)/ 3)+1):ncol(datos)){
if(ncol(datos)/ 3)+1):ncol(datos))
长度(datos [,i] [is.na(datos [,i])== F])> 0){
for(j in 1:(length(datos [,i] [is.na (datos [,i])== F]))){
nom = as.character(i)
vAt [j,nom] = datos [,i] [is.na(datos [ ,i])== F] [j]
}
} else {next}
}

但是,您可以对代码进行各种改进:


  1. 使用 vectorisation * ply 函数而不是 for li>
  2. 不将逻辑向量与 == F 进行比较,而只是使用!is.na(...) / code>。

  3. 使用 sum(is.na(...))来代替 length(d [,i] [!is.na(...)])

还有一些。这些不会改变代码的正确性,但会使它更有效率和更习惯。


I am new to R and this site. My aim with the following, assuredly unnecessarily-arcane code is to create an R function that produces a special type of box plot in ggplot2. I first need to process potential input thereinto by calculating the variables that I shall later wish to have plotted.

I start by generating some random data, called datos:

c1=rnorm(98,47,23)
c2=rnorm(98,56,13)
c3=rnorm(98,52,7)
fila1=as.matrix(t(c(-2,15,30)))
colnames(fila1)=c("c1","c2","c3")
fila2=as.matrix(t(c(-20,5,20)))
colnames(fila2)=c("c1","c2","c3")
datos=rbind(data.frame(c1,c2,c3),fila1,fila2)
rm(c1,c2,c3,fila1,fila2)

Then, I calculate the variables to later be plotted, which include for each of the present columns in datos the mean (puntoMedio), the first and third quartiles (cuar1,cuar3), the inner-quartile range (iqr), the lower bound of potential submean whiskers (limInf), the upper bound of potential supermean whiskers (limSup) and outliers (submean outliers vAtInf and supermean outliers vAtSup to be combined in vAt):

puntoMedio=apply(datos,MARGIN=2,FUN=mean)
cuar1=apply(datos,MARGIN=2,FUN=quantile,probs=.25)
cuar3=apply(datos,MARGIN=2,FUN=quantile,probs=.75)
cuar=rbind(cuar1,cuar3)
iqr=apply(cuar,MARGIN=2,FUN=diff)
cuar=rbind(cuar,iqr,puntoMedio)
limInf=array(dim=ncol(datos))
  for(i in 1:ncol(datos)){
    limInf0=as.matrix(t(cuar[1,]-1.5*cuar[3,]))
    if(length(datos[datos[,i]<limInf0[,i],i])>0){
      limInf[i]=limInf0[,i]
    }else{limInf[i]=min(datos[,i])}
  }
limSup=array(dim=ncol(datos))
  for(i in 1:ncol(datos)){
    limSup0=as.matrix(t(cuar[2,]+1.5*cuar[3,]))
    if(length(datos[datos[,i]>limSup0[,i],i])>0){
      limSup[i]=limSup0[,i]
    }else{limSup[i]=max(datos[,i])}
  }
d=data.frame(t(rbind(cuar,limInf,limSup)))
rm(cuar)
vAtInf=datos
  for(i in 1:ncol(vAtInf)){
    vAtInf[vAtInf[,i]>limInf0[,i],i]=NA
  }
  colnames(vAtInf)=c("vAtInfc1","vAtInfc2","vAtInfc3")
vAtSup=datos
  for(i in 1:ncol(vAtSup)){
    vAtSup[vAtSup[,i]<limSup0[,i],i]=NA
  }
  colnames(vAtSup)=c("vAtSupc1","vAtSupc2","vAtSupc3")
datos=cbind(datos,vAtInf,vAtSup)
rm(limInf0,limSup0,cuar1,cuar3,i,iqr,limInf,limSup,puntoMedio)

Everything works as desired up until here. I have two data frames d and datos, the former of no interest here, and the latter, which in this specific case comprises nine columns: three of all values, three of the corresponding submean outliers and three of the corresponding supermean outliers (these latter six padded with NA). I now wish to extract all outliers by column, wherefore I have tried formulating the following loop. While it does work giving neither error nor warning, it also does not give the desired output in vAt (again, the by-column [columns 4:9] outliers from datos). The problem, then, as far as I have been able to discern, occurs in the nested for-loop, upon attempting to input i into vAt: each iteration of the loop erases the last, such that upon completion of the entire loop, vAt only contains NA and the outliers from the last column/of the last iteration.

for(i in ((ncol(datos)/3)+1):ncol(datos)){
    vAt=matrix(nrow=.25*nrow(datos),ncol=ncol(datos)-(ncol(datos)/3))
    colnames(vAt)=c(((ncol(datos)/3)+1):ncol(datos))
    if(length(datos[,i][is.na(datos[,i])==F])>0){
        for(j in 1:(length(datos[,i][is.na(datos[,i])==F]))){
            nom=as.character(i)
            vAt[j,nom]=datos[,i][is.na(datos[,i])==F][j]
        }
    }else{next}
}

I have not been able to find any existent thread that answers my question. Thanks for any help.

解决方案

The problem is that you are initialising vAt inside the loop here. Moving the initialisation statements outside the for loop will fix the problem that you are facing:

vAt=matrix(nrow=.25*nrow(datos),ncol=ncol(datos)-(ncol(datos)/3))
colnames(vAt)=c(((ncol(datos)/3)+1):ncol(datos))
for(i in ((ncol(datos)/3)+1):ncol(datos)){
    if(length(datos[,i][is.na(datos[,i])==F])>0){
        for(j in 1:(length(datos[,i][is.na(datos[,i])==F]))){
            nom=as.character(i)
            vAt[j,nom]=datos[,i][is.na(datos[,i])==F][j]
        }
    }else{next}
}

However, there are various improvements which you can make to the code as it stands:

  1. Using vectorisation and *ply functions instead of for loops.
  2. Not comparing logical vectors to ==F but instead only using !is.na(...).
  3. Using sum(is.na(...)) instead of length(d[,i][!is.na(...)])

And some more. These will not change the correctness of the code, but will make it more efficient and more idiomatic.

这篇关于在嵌套循环中进行索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆