在多列上使用数据帧 [英] Using if else on a dataframe across multiple columns

查看:130
本文介绍了在多列上使用数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的样本数据集,其中描述了样本是否可行 - 它看起来(类似),其中'desc'是描述列,'空白'表示样本不可行:

  desc xyz 
1 blank 4.529976 5.297952 5.581013
2 blank 5.906855 4.557389 4.901660
3样品4.322014 4.798248 4.995959
4样本3.997565 5.975604 7.160871
5空白4.898922 7.666193 5.551385
6空白5.667884 5.195825 5.232072
7空白5.524773 6.726074 4.767475
8样本4.382937 5.926217 5.203737
9示例4.976908 3.079191 4.614121
10 blank 4.572954 4.772373 6.077195

我想使用if else语句将具有不可用数据的行设置为NA。最后的数据集应该如下所示:

  desc xyz 
1空白不适用不适用
2空白NA NA NA
3样本4.322014 4.798248 4.995959
4样本3.997565 5.975604 7.160871
5空白不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用$ b 8样本4.382937 5.926217 5.203737
9样本4.976908 3.079191 4.614121
10空白不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用b

我已经尝试了一个for循环,但是我无法获得for循环来更改一个循环中的所有列。我真正的数据集有40列,所以我宁愿不必在单独的循环中处理它!这里是一次更改一列的代码:

  for(i in 1:length(desc)){
$ if(dat $ desc [i] ==blank){
dat $ x [i] < - NA
}
else {
dat $ x [ i]< - dat $ x [i]
}
}

我用这个脚本制作了样本数据:

  desc < -  c(blank,blank,sample ,sample,blank,blank,blank,sample,sample,blank)
x < - rorm(10,mean = 5,sd = 1) $ by by -norm(10,mean = 5,sd = 1)
z -norm(10,mean = 5,sd = 1)

dat < - data。 frame(desc,x,y,z)

对不起,如果这是一个基本问题,花了一整天早上看论坛,一直没能找到解决方案。

任何帮助非常感谢!



选项1,命名要更改的列:

  dat [wh ich(dat $ desc ==blank),c(x,y,z)] < -  NA 

在具有40列的实际数据中,如果您只想将最后的39列设置为NA,则以下操作可能比命名每列要简单一些;



选项2,使用范围选择列:

$ $ $ $ $ $ $ c $ dat $选项3,不包括该选项第一列:

$ $ p $ dat [哪个(dat $ desc ==空白),-1] < - NA

选项4,不包括指定的列:

 dat [其中(dat $ desc ==blank),!names(dat)%in%desc] < -  NA 

正如你所看到的,有很多方法可以做这种操作(这还远不是一个完整的列表),并且了解每个选项作品将帮助您更好地理解语言。


I have a large dataset of samples with descriptors of whether the sample is viable - it looks (kind of) like this, where 'desc' is the description column and 'blank' indicates the sample is not viable:

     desc        x        y        z
1   blank 4.529976 5.297952 5.581013
2   blank 5.906855 4.557389 4.901660
3  sample 4.322014 4.798248 4.995959
4  sample 3.997565 5.975604 7.160871
5   blank 4.898922 7.666193 5.551385
6   blank 5.667884 5.195825 5.232072
7   blank 5.524773 6.726074 4.767475
8  sample 4.382937 5.926217 5.203737
9  sample 4.976908 3.079191 4.614121
10  blank 4.572954 4.772373 6.077195

I want to use an if else statement to set the rows with unuseable data to NA. The final data set should look like this:

     desc        x        y        z
1   blank       NA       NA       NA
2   blank       NA       NA       NA
3  sample 4.322014 4.798248 4.995959
4  sample 3.997565 5.975604 7.160871
5   blank       NA       NA       NA
6   blank       NA       NA       NA
7   blank       NA       NA       NA
8  sample 4.382937 5.926217 5.203737
9  sample 4.976908 3.079191 4.614121
10  blank       NA       NA       NA 

I have tried a for loop, but I'm having trouble getting the for-loop to change all the columns in one loop. My real dataset has 40 columns, so I'd rather not have to process it in separate loops! Here is the code to change one column at a time:

for(i in 1:length(desc)){
    if(dat$desc[i] =="blank"){
    dat$x[i] <- NA
    } 
    else {
    dat$x[i] <- dat$x[i]
    }
}

I made the sample data with this script:

desc <- c("blank", "blank", "sample", "sample", "blank", "blank", "blank",    "sample", "sample", "blank")
x <-  rnorm(10, mean=5, sd=1)
y <-  rnorm(10, mean=5, sd=1)
z <-  rnorm(10, mean=5, sd=1)

dat <- data.frame(desc,x,y,z)

Sorry if this is a basic question, I've spent all morning looking at forums and haven't been able to find a solution.

Any help is much appreciated!

解决方案

For your example dataset this will work;

Option 1, name the columns to change:

dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NA

In your actual data with 40 columns, if you just want to set the last 39 columns to NA, then the following may be simpler than naming each of the columns to change;

Option 2, select columns using a range:

dat[which(dat$desc == "blank"), 2:40] <- NA

Option 3, exclude the 1st column:

dat[which(dat$desc == "blank"), -1] <- NA

Option 4, exclude a named column:

dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NA

As you can see, there are many ways to do this kind of operation (this is far from a complete list), and understanding how each of these options works will help you to get a better understanding of the language.

这篇关于在多列上使用数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆