使用Purrr和Dplyr重新编码跨多个数据帧的类似因子级别 [英] Recoding Similar Factor Levels Across Multiple Data Frames Using Purrr and Dplyr

查看:188
本文介绍了使用Purrr和Dplyr重新编码跨多个数据帧的类似因子级别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是两个简单的数据帧。我想重新编码(折叠) Sat1 Sat2 列,以便所有程度的满足都被编码简单作为满意,所有不满意度都被编码为不满意。中立将保持中立。因此,这些因素将有三个级别 - 满意,不满意和中性



我通常会通过绑定数据框来实现这一点,并使用 lapply code> car package,如:

  DF1 [2:3] (DF1 [2:3],重新编码,c('有点满意=满意,满意=满意,极度不满意=不满意.......等等, etc 

我想使用地图函数来完成这个功能,特别是 at_map (为了保持数据框架,但我是新的 purrr 所以随便提出其他版本的地图)从 purrr ,以及 dplyr ,tidyr stringr / code> ggplot2`,所以一切都可以轻松流水线。



下面的例子是我想要完成的,但是对于重新编码,无法使其正常工作。



http://www.r-bloggers.com/using-purrr-wi th-dplyr /



我想使用at_map或类似的地图函数,以便我可以保留 Sat1的原始列 Sat2 ,所以重新编码的列将被添加到数据框架并重命名。如果这个步骤也可以包含在一个函数中,那将是巨大的。



在现实中,我将拥有许多数据帧,所以我只想重新编码因子级一次,然后使用 purrr 使用最少的代码对所有数据帧进行更改。

 名称< -c(James,Chris,Jessica,Tomoki,Anna Gerald)
Sat1< -c(满意,非常满意,不满意,不满意,不满意,中性)
Sat2 < 有点满意,中立,中立,满意,满意)
计划< -c(A,B,A,C ,D)
宠物< -c(Snake,狗,狗,狗,猫,无)

DF1 < name.frame(Name,Sat1,Sat2,Program,Pets)

名称< -c(Tim,John,Amy,Alberto,Desrahi,Francesca )
Sat1 <-c(非常满意,满意,满意,有点不满意,不满意,满意)
Sat2 <-c(不满意有些不满意,中立,非常不满意,有点满意,有点不满意)
计划< -c(A,B,A,C ,D)


DF2< -data.frame(Names,Sat1,Sat2,Program)


解决方案

这样做的一个方法是使用 mutate_每个将这些工作与其中一个映射函数进行整合以查看data.frames列表。使用 dplyr_0.4.3.9001 中的 mutate_each 或等效项允许您重命名新列。



在这种情况下,您可以使用字符串操作而不是重新编码。我相信你想拉出满意不满意中性从你现有的字符串。您可以用 sub 使用正则表达式来实现此目的。例如,

  sub(。*(Satisfied | Dissatisfied | Neutral)。* $,\\1 ,DF2 $ Sat2)
不满意不满意中立不满意满意不满意

stringr 有一个很好的功能来提取特定的字符串, str_extract

  library(stringr)
str_extract(DF2 $ Sat2,Satisfied | Neutral | Dissatisfied)
不满意不满意中立不满意满意不满意

您可以在 mutate_each 在多个列上使用这些函数之一。您在 funs 中为函数赋予的名称将添加到新列名称。我使用 recode 。对于其中一个数据集:

  DF1%>%
mutate_each(funs(recode = str_extract(。
starts_with(Sat))

名称Sat1 Sat2程序宠物Sat1_recode Sat2_recode
1詹姆斯满意非常不满意Snake满意不满
2克里斯非常满意一些满意B满足满意B $ b 3杰西卡不满意中立狗不满意中性
4 Tomoki有点满意中性C狗满意中性
5安娜不满意满意的B猫不满意满意
6杰拉尔德中性满足D无中性满足

要查看存储在列表中的许多数据集,您可以使用 purrr 中的映射函数对于列表中的每一个元素,都可以使用这个函数。

 列表(DF1,DF2)%>%
map mutate_each(.x,
funs(recode = str_extract(。,Satisfied | Neutral | Dissatisfied)),
starts_with(Sat))

[[1 ]]
名称Sat1 Sat2程序宠物Sat1_recode Sat2_recode
1詹姆斯满意非常不满意一个满意的不满意
2克里斯非常满意有点满意B狗满意满足
...
[[2]]
名称Sat1 Sat2程序Sat1_recode Sat2_recode
1 Tim非常满意不满意满意不满
2 John满意有点不满意B满意不满
...

使用 map_df 将绑定所有元素列入数据框架,这可能或可能不是你想要的。使用 .id 参数为每个原始数据集添加一个名称。

 列表(DF1,DF2)%>%
map_df(〜mutate_each(.x,
funs(recode = str_extract(。,Satisfied | Neutral | Dissatisfied)),
starts_with Sat)),.id =Group)

组名Sat1 Sat2程序宠物Sat1_recode
1 1 James满意非常不满意Snake满足
2 1 Chris非常满意有点满意B狗满意
3 1杰西卡不满意中立狗不满意
4 1 Tomoki有点满意中性C狗满意
5 1安娜不满意B不满意
6 1杰拉尔德中立满意D无中性
7 2 Tim非常满意A< NA>满意
8 2约翰满意有些不满意B< NA>满意
...


Below are two simple data frames. I would like to re-code (collapse) the Sat1 and Sat2 columns so that all degrees of satisfied are coded simply as Satisfied, and all degrees of Dissatisfied are coded as Dissatisfied. Neutral will remain as Neutral. These factors will therefore have three levels - Satisfied, Dissatisfied, and Neutral.

I would normally accomplish this by binding the data frames, and using lapply along with re-code from the car package, such as:

  DF1[2:3] <- lapply(DF1[2:3], recode, c('"Somewhat Satisfied"= "Satisfied","Satisfied"="Satisfied","Extremely Dissatisfied"="Dissatisfied"........etc, etc

I would like to accomplish this using map functions, specifically at_map (to maintain the data frame, but I'm new to purrr so feel free to suggest other versions of map) from purrr, as well as dplyr, tidyr,stringrandggplot2` so everything can be easily pipelined.

The example below is what I would like to accomplish, but for re-coding, but I was unable to make it work.

http://www.r-bloggers.com/using-purrr-with-dplyr/

I would like to use at_map or a similar map function so that I can keep the original columns of Sat1 and Sat2, so the re-coded columns will be added to the data frame and renamed. It would be great if this step could also be included within a function.

In reality, I will have many data frames, so I only want to recode the factor levels once, and then use a function from purrr to make the changes across all the data frames using the least amount of code.

Names<-c("James","Chris","Jessica","Tomoki","Anna","Gerald")
Sat1<-c("Satisfied","Very Satisfied","Dissatisfied","Somewhat Satisfied","Dissatisfied","Neutral")
Sat2<-c("Very Dissatisfied","Somewhat Satisfied","Neutral","Neutral","Satisfied","Satisfied")
Program<-c("A","B","A","C","B","D")
Pets<-c("Snake","Dog","Dog","Dog","Cat","None")

DF1<-data.frame(Names,Sat1,Sat2,Program,Pets)

Names<-c("Tim","John","Amy","Alberto","Desrahi","Francesca")
Sat1<-c("Extremely Satisfied","Satisfied","Satisfed","Somewhat Dissatisfied","Dissatisfied","Satisfied")
Sat2<-c("Dissatisfied","Somewhat Dissatisfied","Neutral","Extremely Dissatisfied","Somewhat Satisfied","Somewhat Dissatisfied")
Program<-c("A","B","A","C","B","D")


DF2<-data.frame(Names,Sat1,Sat2,Program)

解决方案

One way to do this is to use mutate_each to do the work combined with one of the map functions to go through a list of data.frames. Using mutate_each or equivalent from dplyr_0.4.3.9001 allows you to rename the new columns.

You could use string manipulation instead of recoding in this case. I believe you want to pull out Satisfied, Dissatisfied, or Neutral from the current strings that you have. You can achieve this with sub using regular expressions. For example,

sub(".*(Satisfied|Dissatisfied|Neutral).*$", "\\1", DF2$Sat2)
"Dissatisfied" "Dissatisfied" "Neutral"      "Dissatisfied" "Satisfied"    "Dissatisfied"

Package stringr has a nice function for extracting specific strings, str_extract.

library(stringr)
str_extract(DF2$Sat2, "Satisfied|Neutral|Dissatisfied")
 "Dissatisfied" "Dissatisfied" "Neutral"      "Dissatisfied" "Satisfied"    "Dissatisfied"

You can use this within mutate_each to use one of these functions on multiple columns. The name you give for the function within funs is what will be added on to the new columns names. I used recode. For one of your datasets:

DF1 %>% 
    mutate_each( funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied") ), 
              starts_with("Sat") )

    Names               Sat1               Sat2 Program  Pets  Sat1_recode  Sat2_recode
1   James          Satisfied  Very Dissatisfied       A Snake    Satisfied Dissatisfied
2   Chris     Very Satisfied Somewhat Satisfied       B   Dog    Satisfied    Satisfied
3 Jessica       Dissatisfied            Neutral       A   Dog Dissatisfied      Neutral
4  Tomoki Somewhat Satisfied            Neutral       C   Dog    Satisfied      Neutral
5    Anna       Dissatisfied          Satisfied       B   Cat Dissatisfied    Satisfied
6  Gerald            Neutral          Satisfied       D  None      Neutral    Satisfied

To go through many datasets stored in a list, you can use a map function from purrr to perform a function on every element in the list.

list(DF1, DF2) %>%
    map(~mutate_each(.x, 
                  funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied") ), 
                  starts_with("Sat")) )

[[1]]
    Names               Sat1               Sat2 Program  Pets  Sat1_recode  Sat2_recode
1   James          Satisfied  Very Dissatisfied       A Snake    Satisfied Dissatisfied
2   Chris     Very Satisfied Somewhat Satisfied       B   Dog    Satisfied    Satisfied
...
[[2]]
      Names                  Sat1                   Sat2 Program  Sat1_recode  Sat2_recode
1       Tim   Extremely Satisfied           Dissatisfied       A    Satisfied Dissatisfied
2      John             Satisfied  Somewhat Dissatisfied       B    Satisfied Dissatisfied
...

Using map_df instead will bind all of the elements in your list into a data.frame, which may or may not be what you want. Using the .id argument adds a name for each original dataset.

list(DF1, DF2) %>%
    map_df(~mutate_each(.x, 
                  funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied")), 
                  starts_with("Sat")), .id = "Group")

   Group     Names                  Sat1                   Sat2 Program  Pets  Sat1_recode
1      1     James             Satisfied      Very Dissatisfied       A Snake    Satisfied
2      1     Chris        Very Satisfied     Somewhat Satisfied       B   Dog    Satisfied
3      1   Jessica          Dissatisfied                Neutral       A   Dog Dissatisfied
4      1    Tomoki    Somewhat Satisfied                Neutral       C   Dog    Satisfied
5      1      Anna          Dissatisfied              Satisfied       B   Cat Dissatisfied
6      1    Gerald               Neutral              Satisfied       D  None      Neutral
7      2       Tim   Extremely Satisfied           Dissatisfied       A  <NA>    Satisfied
8      2      John             Satisfied  Somewhat Dissatisfied       B  <NA>    Satisfied
...

这篇关于使用Purrr和Dplyr重新编码跨多个数据帧的类似因子级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆