很长的if else循环并在R中重新编码 [英] long if else loop and recoding in R

查看:80
本文介绍了很长的if else循环并在R中重新编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道我的问题很简单,但对我而言并非如此.这是小的数据集.

I know my problem is simple but not for me. Here is small dataset.

mark1 <- c("AB", "BB", "AB", "BB", "BB", "AB", "--", "BB")
mark2 <- c("AB", "AB", "AA", "BB", "BB", "AA", "--", "BB")
mark3 <- c("BB", "AB", "AA", "BB", "BB", "AA", "--", "BB")
mark4 <- c("AA", "AB", "AA", "BB", "BB", "AA", "--", "BB")
mark5 <- c("AB", "AB", "AA", "BB", "BB", "AA", "--", "BB")
mark6 <- c("--", "BB", "AA", "BB", "BB", "AA", "--", "BB")
mark7 <- c("AB", "--", "AA", "BB", "BB", "AA", "--", "BB")
mark8 <- c("BB", "AA", "AA", "BB", "BB", "AA", "--", "BB")
mymark <- data.frame (mark1, mark2, mark3, mark4, mark5, mark6, mark7, mark8)
tmymark <- data.frame (t(mymark))
names (tmymark) <- c("P1", "P2","I1", "I2", "I3", "I4", "KL", "MN")

因此数据集变为:

      P1 P2 I1 I2 I3 I4 KL MN
mark1 AB BB AB BB BB AB -- BB
mark2 AB AB AA BB BB AA -- BB
mark3 BB AB AA BB BB AA -- BB
mark4 AA AB AA BB BB AA -- BB
mark5 AB AB AA BB BB AA -- BB
mark6 -- BB AA BB BB AA -- BB
mark7 AB -- AA BB BB AA -- BB
mark8 BB AA AA BB BB AA -- BB

我想基于P1和P2的比较对mark1:8进行分类,并提供一个代码,该代码将产生一个新变量:

I want to classify mark1:8 based on the P1 and P2 comparision and provide a code, which will make a new variable:

loctype <- NULL

if (tmymark$P1 == "AB" &  tmymark$P2 == "AB"){
       loctype = "<hkxhk>"
       } else {
if (tmymark$P1== "AB" & tmymark$P2 == "BB") {
      loctype = "<lmxll>"
      } else {
      if (tmymark$P1 == "AA" & tmymark$P2 == "AB") {
       loctype = "<nnxnp>"
       } else {
        if (tmymark$P1 == "AA" & tmymark$P2 == "BB") {
        loctype = "MN"
        } else {
        if (tmymark$P1 == "BB" & tmymark$P2 == "AA"){
         loctype = "MN"
         } else {
         if (tmymark$P1 == "--" & tmymark$P2 == "AA"){
         loctype = "NR"
         }  else {
if (tmymark$P1 == "AA" & tmymark$P2 == "--"){
          loctype = "NR"
} else {
    cat ("error wrong input in P1 or P2")
    }} }}}}}

这是我要尝试进行的操作,比较P1和P2值并生成一个新变量. 例如,如果tmymark $ P1 =="AB"& tmymark $ P2 =="AB" loctype应该为".如果不是,第二个条件将是应用程序,依此类推.

Here what I am trying to do it compare P1 and P2 values and generated a new variable. for examp, if tmymark$P1 == "AB" & tmymark$P2 == "AB" the loctype should be "". If not the second condition will be application and so on.

这是我的错误信息.

Warning messages:
1: In if (tmymark$P1 == "AB" & tmymark$P2 == "AB") { :
  the condition has length > 1 and only the first element will be used
2: In if (tmymark$P1 == "AB" & tmymark$P2 == "BB") { :
  the condition has length > 1 and only the first element will be used

一旦生成了loctype向量,我想使用此变量中的信息重新编码tmymark:

Once loctype vector is generated I want to recode the tmymark with the information in this variable:

tmymark1 <- data.frame (loctype, tmymark)      
require(car) 
for(i in 2:length(tmymark)){

        if (loctype = "<hkxhk>") {
       tmymark[[i]] <- recode (x, "AB" = "hk", "BA" = "hk", "AA" = "hh", "BB" = "kk")
       } else {
       if (loctype = "<lmxll>") {
       tmymark[[i]] <- recode ((x, "AB" = "lm", "BA" = "lm", "AA" = "--", "BB" = "kk")
       } else {

        if (loctype = "<nnxnp>") {
       tmymark[[i]] <- recode ((x, "AB" = "np", "BA" = "np", "AA" = "nn", "BB" = "--")
             } else {
       if (loctype = "MN") {
        tmymark[[i]] <- "--"
       } esle {
      if (loctype = "NR") {
        tmymark[[i]] <- "NA"
       } else {
       cat ("error wrong input code")
      } } }}} 

我在正确的轨道上吗?

Am I on right track ?

预期输出

      loctype  P1 P2 I1 I2 I3 I4 KL MN 
mark1  <lmxmm> lm mm lm mm mm lm -- mm 
mark2  <hkxhk> hk hk hh kk kk hh -- kk 
mark3 <nnxnp> nn np nn -- -- nn -- -- 
 and so on 

推荐答案

match绝对是必经之路.我将两个数据帧作为键,如下所示:

match is definitely the way to go. I'd make two data frames as keys, like this:

key <- data.frame(
             P1=c("AB", "AB", "AA", "AA", "BB", "--", "AA"),
             P2=c("AB", "BB", "AB", "BB", "AA", "AA", "--"),
             loctype=c("<hkxhk>", "<lmxll>", "<nnxnp>", "MN", "MN", "NR", "NR"))

key2 <- cbind(
  `<hkxhk>` = c("hk","hk","hh","kk"),
  `<lmxll>` = c("lm", "lm", "--", "kk"),
  `<nnxnp>` = c("np", "np", "nn", "--"),
  MN = rep("--", 4),
  NR = rep("NA", 4) )
rownames(key2) = c("AB","BA", "AA", "BB")

,然后在key1上使用match来获得loctype(正如贾斯汀也建议的那样),并且在key2的行名和列上都可以得到所需的替换,使用矩阵索引来获得键的期望值.

and then use match on key1 to get the loctype (as Justin also recommends), and also on both the rownames and columns of key2 to get the desired substitution, using matrix indexing to get the desired value from the key.

loctype <- key$loctype[match(with(tmymark, paste(P1, P2, sep="\b")), 
                             with(key, paste(P1, P2, sep="\b")))]
ii <- match(as.vector(as.matrix(tmymark)), rownames(key2))
jj <- rep(match(loctype, colnames(key2)), nrow(tmymark))
out <- as.data.frame(matrix(key2[cbind(ii,jj)], nrow=nrow(tmymark)))
colnames(out) <- colnames(tmymark)
rownames(out) <- rownames(tmymark)
out$loctype <- loctype

然后结果看起来像这样,其中缺少值是因为我的键中没有这些组合的值.

The result then looks like this, where the missing values are because I don't have values for those combinations in my keys.

> print(out, na="")
      P1 P2 I1 I2 I3 I4 KL MN loctype
mark1 lm kk lm kk kk lm    kk <lmxll>
mark2 hk hk hh kk kk hh    kk <hkxhk>
mark3                                
mark4 nn np nn -- -- nn    -- <nnxnp>
mark5 hk hk hh kk kk hh    kk <hkxhk>
mark6                                
mark7                                
mark8 -- -- -- -- -- --    --      MN

这篇关于很长的if else循环并在R中重新编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆