将跨数据帧的共享数据列中的多个值重新编码/替换为单个值 [英] recode/replace multiple values in a shared data column to a single value across data frames

查看:58
本文介绍了将跨数据帧的共享数据列中的多个值重新编码/替换为单个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我不会错过它,但是我一直无法找到解决该问题的可行方法. 我有一组带有共享列的数据框.这些列包含多个变化的转录错误,对于多个值,其中一些是共享的,而其他则不共享. 我想用所有数据帧中的正确值(good_values)替换/重新编码转录错误(bad_values).

I hope I haven't missed it, but I haven't been able to find a working solution to this problem. I have a set of data frames with a shared column. These columns contain multiple and varying transcription errors, some of which are shared, others not, for multiple values. I would like replace/recode the transcription errors (bad_values) with the correct values (good_values) across all data frames.

我已经尝试在数据框,bad_values和good_values列表之间嵌套map*()函数系列,以实现此目的.这是一个示例:

I have tried nesting the map*() family of functions across lists of data frames, bad_values, and good_values to do this, among other things. Here is an example:

df1 = data.frame(grp = c("a1","a.","a.",rep("b",7)), measure = rnorm(10))

df2 = data.frame(grp = c(rep("as", 3), "b2",rep("a",22)), measure = rnorm(26))

df3 = data.frame(grp = c(rep("b-",3),rep("bq",2),"a", rep("a.", 3)), measure = 1:9)


df_list = list(df1, df2, df3)
bad_values = list(c("a1","a.","as"), c("b2","b-","bq"))
good_values = list("a", "b")

dfs = map(df_list, function(x) {
  x %>% mutate(grp = plyr::mapvalues(grp, bad_values, rep(good_values,length(bad_values))))
})

我不一定希望能超越一个好坏值对.但是,我认为在此范围内嵌套另一个对map*()的调用可能会起作用:

Which I didn't necessarily expect to work beyond a single good-bad value pair. However, I thought nesting another call to map*() within this might work:

dfs = map(df_list, function(x) {
x %>% mutate(grp = map2(bad_values, good_values, function(x,y) {
recode(grp, bad_values = good_values)})
})

我尝试了许多其他方法,但都没有奏效.

I have tried a number of other approaches, none of which have worked.

最终,我想从一组有错误的数据帧开始,如下所示:

Ultimately, I would like to go from a set of data frames with errors, as here:

[[1]]
  grp    measure
1  a1  0.5582253
2  a.  0.3400904
3  a. -0.2200824
4   b -0.7287385
5   b -0.2128275
6   b  1.9030766

[[2]]
  grp    measure
1  as  1.6148772
2  as  0.1090853
3  as -1.3714180
4  b2 -0.1606979
5   a  1.1726395
6   a -0.3201150

[[3]]
  grp measure
1  b-       1
2  b-       2
3  b-       3
4  bq       4
5  bq       5
6   a       6

对于固定"数据帧的列表,例如:

To a list of 'fixed' data frames, as such:

[[1]]
  grp    measure
1   a -0.7671052
2   a  0.1781247
3   a -0.7565773
4   b -0.3606900
5   b  1.9264804
6   b  0.9506608

[[2]]
  grp     measure
1   a  1.45036125
2   a -2.16715639
3   a  0.80105611
4   b  0.24216723
5   a  1.33089426
6   a -0.08388404

[[3]]
  grp measure
1   b       1
2   b       2
3   b       3
4   b       4
5   b       5
6   a       6

任何帮助将不胜感激

推荐答案

以下是将tidyverserecode_factor结合使用的选项.当有多个要更改的元素时,创建键/val元素的list并使用recode_factor进行匹配并将值更改为新的levels

Here is an option using tidyverse with recode_factor. When there are multiple elements to be changed, create a list of key/val elements and use recode_factor to match and change the values to new levels

library(tidyverse)
keyval <- setNames(rep(good_values, lengths(bad_values)), unlist(bad_values))
out <- map(df_list, ~ .x %>% 
                  mutate(grp = recode_factor(grp, !!! keyval)))

-输出

out
#[[1]]
#   grp     measure
#1    a -1.63295876
#2    a  0.03859976
#3    a -0.46541610
#4    b -0.72356671
#5    b -1.11552841
#6    b  0.99352861
#....

#[[2]]
#   grp     measure
#1    a  1.26536789
#2    a -0.48189740
#3    a  0.23041056
#4    b -1.01324689
#5    a -1.41586086
#6    a  0.59026463
#....


#[[3]]
#  grp measure
#1   b       1
#2   b       2
#3   b       3
#4   b       4
#5   b       5
#6   a       6
#....

注意:这不会更改初始数据集列的class

NOTE: This doesn't change the class of the initial dataset column

str(out)
#List of 3
# $ :'data.frame':  10 obs. of  2 variables:
#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 2 2 2 2 2 2
#  ..$ measure: num [1:10] -1.633 0.0386 -0.4654 -0.7236 -1.1155 ...
# $ :'data.frame':  26 obs. of  2 variables:
#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 1 1 1 1 1 1 ...
#  ..$ measure: num [1:26] 1.265 -0.482 0.23 -1.013 -1.416 ...
# $ :'data.frame':  9 obs. of  2 variables:
#  ..$ grp    : Factor w/ 2 levels "a","b": 2 2 2 2 2 1 1 1 1
#  ..$ measure: int [1:9] 1 2 3 4 5 6 7 8 9


一旦我们有一个键值对list,它也可以在base R函数中使用


Once we have a keyval pair list, this can be also used in base R functions

out1 <- lapply(df_list, transform, grp = unlist(keyval[grp]))

这篇关于将跨数据帧的共享数据列中的多个值重新编码/替换为单个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆