给有条件的功能赋予条件 [英] Giving a conditional onto a function that performs for factors

查看:44
本文介绍了给有条件的功能赋予条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与该相关联,并应@Akruns的要求,我要求类似的内容.

This questions ties onto this here, and at @Akruns request I'm asking for something similar.

基本上,如果我在以下条件内插入数据框:

Essentially, If I insert a dataframe within the following conditional:

if(length(weight) > 0) {weight %>% 
    select(where(negate(is.numeric))) %>% 
    map_dfc(~ model.matrix(~ .x -1) %>% 
              as_tibble) %>% 
    rename_all(~ str_remove(., "\\.x")) %>%
    bind_rows(weight, .)  
}

分配:

#Following @Akruns mention for turning numeric into factor:
i1 <- sapply(weight, is.numeric); df[i1] <- lapply(weight[i1], factor) and then use the Filter(function(x) is.factor(x)|is.character(x), weight)

test = function(data) {
  x = as.data.frame(Reduce(cbind, lapply(x, function(col) model.matrix(~ . -1, data = data.frame(col)))))
  setNames(x, sub(pattern = "^col", replacement = "", names(x)))
  
}
  
test(weight)
#Missing column names                     
1  64 57  8 1 0 0 1 0
2  71 59 10 1 0 0 1 0
3  53 49  6 1 0 0 1 0
4  67 62 11 1 0 0 1 0
5  55 51  8 0 0 1 1 0
6  58 50  7 0 0 1 1 0
7  77 55 10 0 0 1 0 1
8  57 48  9 0 0 1 0 1
9  56 42 10 0 1 0 0 1
10 51 42  6 0 1 0 0 1
11 76 61 12 0 1 0 0 1
12 68 57  9 0 1 0 0 1

然后,如果 weight 具有因数,它将把属于因数的列拆分为列,并使用之前出现的 1 0 在其他地方.

Then if weight has factors, it will split columns that are factors into columns and assign them values with 1 where it appeared before and 0 elsewhere.

但是,如果我输入一个仅 numeric 的数据帧,它将返回 character(0).问题是,如何赋予以下函数一个条件,以使得例如 x 的数据框是否为数字,然后按原样返回该数据框.如果这是一个因素,则返回请求的输出.

However, if I input a numeric only dataframe, it returns character(0). The question is, how to give the following function a conditional such that whether the dataframe for example x is numeric then return the dataframe as it is. If it is a factor, then return the requested output.

我之所以要求这样做,是因为我正在寻求在另一个函数中实现这一点,它将包含许多数据框,其中一些仅包含数字,而另一些包含因子.在这种情况下,我可以将数据框表示为函数中的 x .

The reason I request this is because I'm looking to implement this within another function, that will include many dataframe where some have only numeric and others include factors. In that case, I can denote the dataframe as x within the function.

我对函数的

fact_col <- function(x){
if(length(x) > 0) {
  weight_sub <- x %>% 
    select(where(is.factor)) 
  weight_sub %>%
    map_dfc(~ model.matrix(~ .x -1) %>% 
              as_tibble) %>% 
    rename_all(~ str_remove(., "\\.x")) %>%
    bind_cols(weight_sub, .) -> x
 x<- x%>% select(!where(is.factor))
 x<- data.frame(sapply(x, as.numeric))
}}

预期输出:

#when x is numeric
function(x) { ... }
 Richness pat
1        20   1
2        17   2
3        18   3
4        19   4
5        11   5
6        15   6
7        17   7
8        15   8
9        15   9
10        9  10
11       13  11
12       14  12

#when x is a factor
function(x) { ... }

 wgt hgt age    id    sex black brown white female male
1   64  57   8 black female     1     0     0      1    0
2   71  59  10 black female     1     0     0      1    0
3   53  49   6 black female     1     0     0      1    0
4   67  62  11 black female     1     0     0      1    0
5   55  51   8 white female     0     0     1      1    0
6   58  50   7 white female     0     0     1      1    0
7   77  55  10 white   male     0     0     1      0    1
8   57  48   9 white   male     0     0     1      0    1
9   56  42  10 brown   male     0     1     0      0    1
10  51  42   6 brown   male     0     1     0      0    1
11  76  61  12 brown   male     0     1     0      0    1
12  68  57   9 brown   male     0     1     0      0    1

可复制的代码:

structure(list(wgt = c(64L, 71L, 53L, 67L, 55L, 58L, 77L, 57L, 
56L, 51L, 76L, 68L), hgt = c(57L, 59L, 49L, 62L, 51L, 50L, 55L, 
48L, 42L, 42L, 61L, 57L), age = c(8L, 10L, 6L, 11L, 8L, 7L, 10L, 
9L, 10L, 6L, 12L, 9L), id = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 2L), .Label = c("black", "brown", "white"
), class = "factor"), sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female", "male"), class = "factor")), class = "data.frame", row.names = c(NA, 
-12L))

推荐答案

一个选项是在使用 if select factor 并创建一个新对象('weight_sub'),然后检查'weight_sub'上的 length ,如果 大于0,则执行 model.matrix 的其余部分,并将其分配回'weight'

An option is to split the code before we use the if i.e. select the columns that are factor and create a new object ('weight_sub'), then check the length on the 'weight_sub', if it is greater than 0, do the rest of model.matrix and assign it back to 'weight'

weight_sub <- weight %>% 
 select(where(is.factor)) 
 
if(length(weight_sub) > 0) {
  weight_sub %>%
   map_dfc(~ model.matrix(~ .x -1) %>% 
             as_tibble) %>% 
     rename_all(~ str_remove(., "\\.x")) %>%
      bind_cols(weight, .) -> weight

  }

-输出

#   wgt hgt age    id    sex black brown white female male
#1   64  57   8 black female     1     0     0      1    0
#2   71  59  10 black female     1     0     0      1    0
#3   53  49   6 black female     1     0     0      1    0
#4   67  62  11 black female     1     0     0      1    0
#5   55  51   8 white female     0     0     1      1    0
#6   58  50   7 white female     0     0     1      1    0
#7   77  55  10 white   male     0     0     1      0    1
#8   57  48   9 white   male     0     0     1      0    1
#9   56  42  10 brown   male     0     1     0      0    1
#10  51  42   6 brown   male     0     1     0      0    1
#11  76  61  12 brown   male     0     1     0      0    1
#12  68  57   9 brown   male     0     1     0      0    1

作为否定测试,请检查其是否为 character 类列

As a negative test, do this by checking if it is a character class column

weight_sub <- weight %>% 
 select(where(is.character)) 
 
if(length(weight_sub) > 0) {
  weight_sub %>%
   map_dfc(~ model.matrix(~ .x -1) %>% 
             as_tibble) %>% 
     rename_all(~ str_remove(., "\\.x")) %>%
      bind_cols(weight, .) -> weight

  }

没有输出,因为 if 条件返回 FALSE ,因此权重"数据集保持不变,而无需添加任何新列

No output as the if condition returns FALSE, thus the 'weight' dataset remains the same without adding any new columns

在更新中,如果OP也使用 numeric 列传递给 model.matrix ,则它仅返回同一列,即一列(因为循环使用 map 的列(列名称为 .x (来自 model.matrix 公式).当我们使用 str_remove 时,此 .x 列名将通过 rename_all 删除,保留一个空白列名,默认情况下该列名将被填充从 _dfc 分配为"col".为防止这种情况,我们可以在使用 if/else 条件之前,将原始列名称作为后缀附加到具有一列输出并且是数字列的人的后缀

In the update, if the OP is also using numeric columns to be passed into model.matrix, it just returns the same column i.e. one column (as we are looping over columns with map) with the column name as .x (from model.matrix formula). This .x column name is removed with rename_all when we use str_remove, leaving a blank column name, which by default is filled with a column name assigned as 'col' from _dfc. To prevent, that, we can use an if/else condition before doing this to append the original column name as suffix for those having one column output and is a numeric one

weight %>%
    imap_dfc(~ {
        nm1 <- .y
        tmp <- model.matrix(~ .x - 1) %>%
           as_tibble
       if(ncol(tmp) == 1 && class(tmp[[1]]) == 'numeric') {
          names(tmp) <- paste0(names(tmp), nm1)
       }
      tmp
     }) %>% 
      rename_all(~ str_remove(., "\\.x"))

-输出

# A tibble: 12 x 8
#     wgt   hgt   age black brown white female  male
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
# 1    64    57     8     1     0     0      1     0
# 2    71    59    10     1     0     0      1     0
# 3    53    49     6     1     0     0      1     0
# 4    67    62    11     1     0     0      1     0
# 5    55    51     8     0     0     1      1     0
# 6    58    50     7     0     0     1      1     0
# 7    77    55    10     0     0     1      0     1
# 8    57    48     9     0     0     1      0     1
# 9    56    42    10     0     1     0      0     1
#10    51    42     6     0     1     0      0     1
#11    76    61    12     0     1     0      0     1
#12    68    57     9     0     1     0      0     1


或者我们使用 base R

 out <- do.call(cbind, unname(Map(function(x, y) {
      tmp <- as.data.frame(model.matrix(~x -1))
      if(ncol(tmp) == 1 & class(tmp[[1]]) == 'numeric') {
          names(tmp) <- paste0(names(tmp), y)}
          tmp
          }, weight, names(weight))))
names(out) <- sub('^x', '', names(out))
out
#   wgt hgt age black brown white female male
#1   64  57   8     1     0     0      1    0
#2   71  59  10     1     0     0      1    0
#3   53  49   6     1     0     0      1    0
#4   67  62  11     1     0     0      1    0
#5   55  51   8     0     0     1      1    0
#6   58  50   7     0     0     1      1    0
#7   77  55  10     0     0     1      0    1
#8   57  48   9     0     0     1      0    1
#9   56  42  10     0     1     0      0    1
#10  51  42   6     0     1     0      0    1
#11  76  61  12     0     1     0      0    1
#12  68  57   9     0     1     0      0    1

这篇关于给有条件的功能赋予条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆