替换R中条目的更短方法 [英] Shorter method to replace entries in R

查看:70
本文介绍了替换R中条目的更短方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始学习R。这是我正在使用的源文件( https:// github.com/cosname/art-r-translation/blob/master/data/Grades.txt )。无论如何,我可以在不使用循环的情况下将字母等级从A更改为4.0,从A-更改为3.7等。

I have started learning R recently. Here's the source file I am working with (https://github.com/cosname/art-r-translation/blob/master/data/Grades.txt). Is there anyway I can change the letter grade from, say, A to 4.0, A- to 3.7 etc. without using the loop?

我在问,因为如果有1M条目, for循环可能不是修改数据的最有效方法。我将不胜感激。

I am asking because if there were 1M entries, "for" loop might not be the most efficient way to modify the data. I would appreciate any help.

由于其中一个张贴者告诉我要发布代码,所以我想到了循环查看我是否能够做到。这是我的代码:

Since one of the posters told me to post my code, I thought of running the for loop to see whether I am able to do it. Here's my code:

mygrades<-read.table("grades.txt",header = TRUE)

i <- for (i in 1:nrow(mygrades))
{
  #print(i)  
  #for now, see whether As get replaced with 4.0.
  if(mygrades[i,1]=="A")
  {
    mygrades[i,1]=4.0
  }
  else if (mygrades[i,2]=="A")
  {
    mygrades[i,2]=4.0
  }
  else if (mygrades[i,3]=="A")
  {
    mygrades[i,3]=4.0
  }
  else
  {
    #do nothing...continues
  }

}

write.table(mygrades,"newgrades.txt")

但是,输出有点奇怪。对于某些 A,我得到NA,而其他的则保持不变。有人可以帮我提供这段代码吗?

However, the output is a little weird. For some "A"s, I get NA and others are left as it is. Can someone please help me with this code?

@alistaire,我确实尝试过Hadley的查找表,它可以正常工作。我还查看了dplyr代码,它运行良好。但是,出于我的理解,我仍在尝试使用for循环。请注意,自我打开一本R书以来已经过了两天。这是修改后的代码。

@alistaire, I did try Hadley's look-up table, and it works. I also looked at dplyr code, and it works well. However, for sake of my understanding, I'm still trying to use for loops. Please note that it has been about two days since I opened an R book. Here's the modified code.

#there was one mistake in my code: I didn't use stringsAsFactors=False.
#now, this code doesn't work for all "A"s. It spits out 4.0 for some As, and #doesn't do so for others. Why would that be?

mygrades<-read.table("grades.txt",header = TRUE,stringsAsFactors=FALSE)

i <- for (i in 1:nrow(mygrades))
{
  #print(i)  
  if(mygrades[i,1]=="A")
  {
    mygrades[i,1]=4.0
  }
  else if (mygrades[i,2]=="A")
  {
    mygrades[i,2]=4.0
  }
  else if (mygrades[i,3]=="A")
  {
    mygrades[i,3]=4.0
  }
  else
  {
    #do nothing...continues
  }

}

write.table(mygrades,"newgrades.txt")

输出为:

"final_exam" "quiz_avg" "homework_avg"
"1" "C" "4" "A"
"2" "C-" "B-" "4"
"3" "D+" "B+" "4"
"4" "B+" "B+" "4"
"5" "F" "B+" "4"
"6" "B" "A-" "4"
"7" "D+" "B+" "A-"
"8" "D" "A-" "4"
"9" "F" "B+" "4"
"10" "4" "C-" "B+"
"11" "A+" "4" "A"
"12" "A-" "4" "A"
"13" "B" "4" "A"
"14" "D-" "A-" "4"
"15" "A+" "4" "A"
"16" "B" "A-" "4"
"17" "F" "D" "A-"
"18" "B" "4" "A"
"19" "B" "B+" "4"
"20" "A+" "A-" "4"
"21" "4" "A" "A"
"22" "B" "B+" "4"
"23" "D" "B+" "4"
"24" "A-" "A-" "4"
"25" "F" "4" "A"
"26" "B+" "B+" "4"
"27" "A-" "B+" "4"
"28" "A+" "4" "A"
"29" "4" "A-" "A"
"30" "A+" "A-" "4"
"31" "4" "B+" "A-"
"32" "B+" "B+" "4"
"33" "C" "4" "A"

您可以在第一行中看到,第一个A被重新编码为4,但是第二个A没有被重新编码。知道为什么会这样吗?

As you can see in the first row, the first A got recoded as 4, but the second A didn't get recoded. Any idea why this is happening?

预先感谢。

推荐答案

在基数R中的一种典型方法是将一个命名的向量作为查找表,例如

A typical way in base R would be to make a named vector as a lookup table, e.g.

# data with fewer levels for simplicity
df <- data.frame(x = rep(1:3, 2), y = rep(1:2, 3))

lookup <- c(`1` = "A", `2` = "B", `3` = "C")

并将其与每列对应:

data.frame(lapply(df, function(x){lookup[x]}))
##   x y
## 1 A A
## 2 B B
## 3 C A
## 4 A B
## 5 B A
## 6 C B






或者, dplyr 最近添加了 recode 函数,该函数对于此类工作非常有用:


Alternately, dplyr recently added a recode function that's useful for such a job:

library(dplyr)

df <- read.table('https://raw.githubusercontent.com/cosname/art-r-translation/master/data/Grades.txt', header = TRUE)

df %>% mutate_all(funs(recode(., A = '4.0', 
                              `A-` = '3.7'))) %>%    # etc.
    as_data_frame()    # for prettier printing

## # A tibble: 33 x 3
##    final_exam quiz_avg homework_avg
##        <fctr>   <fctr>       <fctr>
## 1           C      4.0          4.0
## 2          C-       B-          4.0
## 3          D+       B+          4.0
## 4          B+       B+          4.0
## 5           F       B+          4.0
## 6           B      3.7          4.0
## 7          D+       B+          3.7
## 8           D      3.7          4.0
## 9           F       B+          4.0
## 10         39       C-           B+
## # ... with 23 more rows

这篇关于替换R中条目的更短方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆