R:使用另一个数据框创建一个新列 [英] R: Creating a new column using another dataframe

查看:140
本文介绍了R:使用另一个数据框创建一个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框:

1)数据1:data1 <- data.frame(Group = c(1, 2, 3), Region = c("Southeast Med, Southeast Low, Southwest Low, Northeast Med", "Northeast High, East Med, Midwest Med High", "Midwest Low, California and HI, West High"),stringsAsFactors=F)

2)data2:data2 <- data.frame(Region = c('California and HI', 'California and HI', 'Northeast High', 'California and HI', 'West High', 'Midwest Med High', 'California and HI', 'California and HI', 'California and HI', 'Southwest Low', 'Midwest Med High', 'California and HI', 'East Med', 'Southeast Low', 'Southeast Med', 'Midwest Med High', 'Southeast Med', 'West High', 'Northeast High', 'California and HI', 'West High', 'California and HI', 'California and HI', 'West High', 'California and HI', 'West High', 'California and HI', 'California and HI'))

我想使用data1在data2中创建一个新列,例如data2$Group,其中group列使用data1检查哪个区域属于哪个组并填充该区域.我该怎么做?另外,假设data1是一个列表,而不是数据帧,那么可能的方法是什么?

I want to create a new column in data2, say data2$Group using data1, where the group column checks using data1 which region falls under which group and populates that.How can I do that? Also, say, data1 were a list instead of a dataframe, what would be the possible approach?

推荐答案

使用您发布的数据集,您可以这样做

Using the datasets you posted you can do this

library(tidyverse)

# update data1
data1_upd = data1 %>% separate_rows(Region, sep = ", ")

# join datasets
data2_upd = data2 %>% left_join(data1_upd, by="Region")

新的数据集data2_upd如下所示:

#               Region Group
# 1  California and HI     3
# 2  California and HI     3
# 3     Northeast High     2
# 4  California and HI     3
# 5          West High     3
# 6   Midwest Med High     2
# 7  California and HI     3
# 8  California and HI     3
# 9  California and HI     3
# 10     Southwest Low     1
# 11  Midwest Med High     2
# 12 California and HI     3
# 13          East Med     2
# 14                      NA
# 15                      NA
# 16                      NA
# 17     Southeast Med     1
# 18         West High     3
# 19    Northeast High     2
# 20 California and HI     3
# 21         West High     3
# 22 California and HI     3
# 23 California and HI     3
# 24         West High     3
# 25 California and HI     3
# 26         West High     3
# 27 California and HI     3
# 28 California and HI     3

请注意,此方法使用精确的字符串匹配以连接2个数据集.因此,它区分大小写,并且您所在区域之前或之后的任何空格都会破坏"连接.这意味着,如果您的数据不如示例中的干净",则可能需要进行一些预处理(例如,将区域更新为小写,删除所有开头/结尾的空格).

Note that this approach uses an exact string matching in order to join the 2 datasets. Therefore, it is case sensitive and any spaces before or after your region will "break" the join. This means that if your data are not as "clean" as in your example, you might have to do some pre-processing (e.g. update regions to lowercase, remove any initial / trailing spaces).

这篇关于R:使用另一个数据框创建一个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆