使用Purrr和Dplyr重新编码跨多个数据帧的类似因子级别 [英] Recoding Similar Factor Levels Across Multiple Data Frames Using Purrr and Dplyr
问题描述
以下是两个简单的数据帧。我想重新编码(折叠) Sat1
和 Sat2
列,以便所有程度的满足都被编码简单作为满意
,所有不满意度都被编码为不满意
。中立将保持中立。因此,这些因素将有三个级别 - 满意,不满意和中性
。
我通常会通过绑定数据框来实现这一点,并使用 lapply
code> car package,如:
DF1 [2:3] (DF1 [2:3],重新编码,c('有点满意=满意,满意=满意,极度不满意=不满意.......等等, etc
我想使用地图函数来完成这个功能,特别是 at_map
(为了保持数据框架,但我是新的
purrr
所以随便提出其他版本的地图)从 purrr
,以及 dplyr
,tidyr ,
stringr / code> ggplot2`,所以一切都可以轻松流水线。
下面的例子是我想要完成的,但是对于重新编码,无法使其正常工作。
http://www.r-bloggers.com/using-purrr-wi th-dplyr /
我想使用at_map或类似的地图函数,以便我可以保留 Sat1的原始列
和 Sat2
,所以重新编码的列将被添加到数据框架并重命名。如果这个步骤也可以包含在一个函数中,那将是巨大的。
在现实中,我将拥有许多数据帧,所以我只想重新编码因子级一次,然后使用 purrr
使用最少的代码对所有数据帧进行更改。
名称< -c(James,Chris,Jessica,Tomoki,Anna Gerald)
Sat1< -c(满意,非常满意,不满意,不满意,不满意,中性)
Sat2 < 有点满意,中立,中立,满意,满意)
计划< -c(A,B,A,C ,D)
宠物< -c(Snake,狗,狗,狗,猫,无)
DF1 < name.frame(Name,Sat1,Sat2,Program,Pets)
名称< -c(Tim,John,Amy,Alberto,Desrahi,Francesca )
Sat1 <-c(非常满意,满意,满意,有点不满意,不满意,满意)
Sat2 <-c(不满意有些不满意,中立,非常不满意,有点满意,有点不满意)
计划< -c(A,B,A,C ,D)
DF2< -data.frame(Names,Sat1,Sat2,Program)
这样做的一个方法是使用 mutate_每个
将这些工作与其中一个映射
函数进行整合以查看data.frames列表。使用 dplyr_0.4.3.9001 中的 mutate_each
或等效项允许您重命名新列。
在这种情况下,您可以使用字符串操作而不是重新编码。我相信你想拉出
满意
,不满意
或中性
从你现有的字符串。您可以用 sub
使用正则表达式来实现此目的。例如, sub(。*(Satisfied | Dissatisfied | Neutral)。* $,\\1 ,DF2 $ Sat2)
不满意不满意中立不满意满意不满意
包 stringr 有一个很好的功能来提取特定的字符串, str_extract
。
library(stringr)
str_extract(DF2 $ Sat2,Satisfied | Neutral | Dissatisfied)
不满意不满意中立不满意满意不满意
您可以在 mutate_each
在多个列上使用这些函数之一。您在 funs
中为函数赋予的名称将添加到新列名称。我使用 recode
。对于其中一个数据集:
DF1%>%
mutate_each(funs(recode = str_extract(。
starts_with(Sat))
名称Sat1 Sat2程序宠物Sat1_recode Sat2_recode
1詹姆斯满意非常不满意Snake满意不满
2克里斯非常满意一些满意B满足满意B $ b 3杰西卡不满意中立狗不满意中性
4 Tomoki有点满意中性C狗满意中性
5安娜不满意满意的B猫不满意满意
6杰拉尔德中性满足D无中性满足
要查看存储在列表中的许多数据集,您可以使用 purrr 中的映射
函数对于列表中的每一个元素,都可以使用这个函数。
列表(DF1,DF2)%>%
map mutate_each(.x,
funs(recode = str_extract(。,Satisfied | Neutral | Dissatisfied)),
starts_with(Sat))
[[1 ]]
名称Sat1 Sat2程序宠物Sat1_recode Sat2_recode
1詹姆斯满意非常不满意一个满意的不满意
2克里斯非常满意有点满意B狗满意满足
...
[[2]]
名称Sat1 Sat2程序Sat1_recode Sat2_recode
1 Tim非常满意不满意满意不满
2 John满意有点不满意B满意不满
...
使用 map_df
将绑定所有元素列入数据框架,这可能或可能不是你想要的。使用 .id
参数为每个原始数据集添加一个名称。
列表(DF1,DF2)%>%
map_df(〜mutate_each(.x,
funs(recode = str_extract(。,Satisfied | Neutral | Dissatisfied)),
starts_with Sat)),.id =Group)
组名Sat1 Sat2程序宠物Sat1_recode
1 1 James满意非常不满意Snake满足
2 1 Chris非常满意有点满意B狗满意
3 1杰西卡不满意中立狗不满意
4 1 Tomoki有点满意中性C狗满意
5 1安娜不满意B不满意
6 1杰拉尔德中立满意D无中性
7 2 Tim非常满意A< NA>满意
8 2约翰满意有些不满意B< NA>满意
...
Below are two simple data frames. I would like to re-code (collapse) the Sat1
and Sat2
columns so that all degrees of satisfied are coded simply as Satisfied
, and all degrees of Dissatisfied are coded as Dissatisfied
. Neutral will remain as Neutral. These factors will therefore have three levels - Satisfied, Dissatisfied, and Neutral
.
I would normally accomplish this by binding the data frames, and using lapply
along with re-code from the car
package, such as:
DF1[2:3] <- lapply(DF1[2:3], recode, c('"Somewhat Satisfied"= "Satisfied","Satisfied"="Satisfied","Extremely Dissatisfied"="Dissatisfied"........etc, etc
I would like to accomplish this using map functions, specifically at_map
(to maintain the data frame, but I'm new to purrr
so feel free to suggest other versions of map) from purrr
, as well as dplyr
, tidyr,
stringrand
ggplot2` so everything can be easily pipelined.
The example below is what I would like to accomplish, but for re-coding, but I was unable to make it work.
http://www.r-bloggers.com/using-purrr-with-dplyr/
I would like to use at_map or a similar map function so that I can keep the original columns of Sat1
and Sat2
, so the re-coded columns will be added to the data frame and renamed. It would be great if this step could also be included within a function.
In reality, I will have many data frames, so I only want to recode the factor levels once, and then use a function from purrr
to make the changes across all the data frames using the least amount of code.
Names<-c("James","Chris","Jessica","Tomoki","Anna","Gerald")
Sat1<-c("Satisfied","Very Satisfied","Dissatisfied","Somewhat Satisfied","Dissatisfied","Neutral")
Sat2<-c("Very Dissatisfied","Somewhat Satisfied","Neutral","Neutral","Satisfied","Satisfied")
Program<-c("A","B","A","C","B","D")
Pets<-c("Snake","Dog","Dog","Dog","Cat","None")
DF1<-data.frame(Names,Sat1,Sat2,Program,Pets)
Names<-c("Tim","John","Amy","Alberto","Desrahi","Francesca")
Sat1<-c("Extremely Satisfied","Satisfied","Satisfed","Somewhat Dissatisfied","Dissatisfied","Satisfied")
Sat2<-c("Dissatisfied","Somewhat Dissatisfied","Neutral","Extremely Dissatisfied","Somewhat Satisfied","Somewhat Dissatisfied")
Program<-c("A","B","A","C","B","D")
DF2<-data.frame(Names,Sat1,Sat2,Program)
One way to do this is to use mutate_each
to do the work combined with one of the map
functions to go through a list of data.frames. Using mutate_each
or equivalent from dplyr_0.4.3.9001 allows you to rename the new columns.
You could use string manipulation instead of recoding in this case. I believe you want to pull out Satisfied
, Dissatisfied
, or Neutral
from the current strings that you have. You can achieve this with sub
using regular expressions. For example,
sub(".*(Satisfied|Dissatisfied|Neutral).*$", "\\1", DF2$Sat2)
"Dissatisfied" "Dissatisfied" "Neutral" "Dissatisfied" "Satisfied" "Dissatisfied"
Package stringr has a nice function for extracting specific strings, str_extract
.
library(stringr)
str_extract(DF2$Sat2, "Satisfied|Neutral|Dissatisfied")
"Dissatisfied" "Dissatisfied" "Neutral" "Dissatisfied" "Satisfied" "Dissatisfied"
You can use this within mutate_each
to use one of these functions on multiple columns. The name you give for the function within funs
is what will be added on to the new columns names. I used recode
. For one of your datasets:
DF1 %>%
mutate_each( funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied") ),
starts_with("Sat") )
Names Sat1 Sat2 Program Pets Sat1_recode Sat2_recode
1 James Satisfied Very Dissatisfied A Snake Satisfied Dissatisfied
2 Chris Very Satisfied Somewhat Satisfied B Dog Satisfied Satisfied
3 Jessica Dissatisfied Neutral A Dog Dissatisfied Neutral
4 Tomoki Somewhat Satisfied Neutral C Dog Satisfied Neutral
5 Anna Dissatisfied Satisfied B Cat Dissatisfied Satisfied
6 Gerald Neutral Satisfied D None Neutral Satisfied
To go through many datasets stored in a list, you can use a map
function from purrr to perform a function on every element in the list.
list(DF1, DF2) %>%
map(~mutate_each(.x,
funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied") ),
starts_with("Sat")) )
[[1]]
Names Sat1 Sat2 Program Pets Sat1_recode Sat2_recode
1 James Satisfied Very Dissatisfied A Snake Satisfied Dissatisfied
2 Chris Very Satisfied Somewhat Satisfied B Dog Satisfied Satisfied
...
[[2]]
Names Sat1 Sat2 Program Sat1_recode Sat2_recode
1 Tim Extremely Satisfied Dissatisfied A Satisfied Dissatisfied
2 John Satisfied Somewhat Dissatisfied B Satisfied Dissatisfied
...
Using map_df
instead will bind all of the elements in your list into a data.frame, which may or may not be what you want. Using the .id
argument adds a name for each original dataset.
list(DF1, DF2) %>%
map_df(~mutate_each(.x,
funs(recode = str_extract(., "Satisfied|Neutral|Dissatisfied")),
starts_with("Sat")), .id = "Group")
Group Names Sat1 Sat2 Program Pets Sat1_recode
1 1 James Satisfied Very Dissatisfied A Snake Satisfied
2 1 Chris Very Satisfied Somewhat Satisfied B Dog Satisfied
3 1 Jessica Dissatisfied Neutral A Dog Dissatisfied
4 1 Tomoki Somewhat Satisfied Neutral C Dog Satisfied
5 1 Anna Dissatisfied Satisfied B Cat Dissatisfied
6 1 Gerald Neutral Satisfied D None Neutral
7 2 Tim Extremely Satisfied Dissatisfied A <NA> Satisfied
8 2 John Satisfied Somewhat Dissatisfied B <NA> Satisfied
...
这篇关于使用Purrr和Dplyr重新编码跨多个数据帧的类似因子级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!