如何在 R 中进行查找和填充(如在 Excel 中)? [英] How to do vlookup and fill down (like in Excel) in R?

查看:37
本文介绍了如何在 R 中进行查找和填充(如在 Excel 中)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约 105000 行和 30 列的数据集.我有一个分类变量,我想将它分配给一个数字.在 Excel 中,我可能会用 VLOOKUP 做一些事情并填充.

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with VLOOKUP and fill.

我将如何在 R 中做同样的事情?

How would I go about doing the same thing in R?

本质上,我拥有的是一个 HouseType 变量,我需要计算 HouseTypeNo.以下是一些示例数据:

Essentially, what I have is a HouseType variable, and I need to calculate the HouseTypeNo. Here are some sample data:

HouseType HouseTypeNo
Semi            1
Single          2
Row             3
Single          2
Apartment       4
Apartment       4
Row             3

推荐答案

如果我正确理解你的问题,这里有四种方法可以做等效于 Excel 的 VLOOKUP 并使用 R 填写:

If I understand your question correctly, here are four methods to do the equivalent of Excel's VLOOKUP and fill down using R:

# load sample data from Q
hous <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
text="HouseType HouseTypeNo
Semi            1
Single          2
Row             3
Single          2
Apartment       4
Apartment       4
Row             3")

# create a toy large table with a 'HouseType' column 
# but no 'HouseTypeNo' column (yet)
largetable <- data.frame(HouseType = as.character(sample(unique(hous$HouseType), 1000, replace = TRUE)), stringsAsFactors = FALSE)

# create a lookup table to get the numbers to fill
# the large table
lookup <- unique(hous)
  HouseType HouseTypeNo
1      Semi           1
2    Single           2
3       Row           3
5 Apartment           4

这里有四种使用lookup表中的值填充largetableHouseTypeNo的方法:

Here are four methods to fill the HouseTypeNo in the largetable using the values in the lookup table:

首先在 base 中使用 merge:

First with merge in base:

# 1. using base 
base1 <- (merge(lookup, largetable, by = 'HouseType'))

在基中使用命名向量的第二种方法:

A second method with named vectors in base:

# 2. using base and a named vector
housenames <- as.numeric(1:length(unique(hous$HouseType)))
names(housenames) <- unique(hous$HouseType)

base2 <- data.frame(HouseType = largetable$HouseType,
                    HouseTypeNo = (housenames[largetable$HouseType]))

三、使用plyr包:

# 3. using the plyr package
library(plyr)
plyr1 <- join(largetable, lookup, by = "HouseType")

四、使用sqldf

# 4. using the sqldf package
library(sqldf)
sqldf1 <- sqldf("SELECT largetable.HouseType, lookup.HouseTypeNo
FROM largetable
INNER JOIN lookup
ON largetable.HouseType = lookup.HouseType")

如果largetable 中的某些房屋类型可能在lookup 中不存在,则将使用左连接:

If it's possible that some house types in largetable do not exist in lookup then a left join would be used:

sqldf("select * from largetable left join lookup using (HouseType)")

也需要对其他解决方案进行相应的更改.

Corresponding changes to the other solutions would be needed too.

这是你想做的吗?告诉我你喜欢哪种方法,我会添加评论.

Is that what you wanted to do? Let me know which method you like and I'll add commentary.

这篇关于如何在 R 中进行查找和填充(如在 Excel 中)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆