使用参考表替换多个值 [英] Replace multiple values using a reference table

查看:39
本文介绍了使用参考表替换多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在清理数据库,其中一个字段是国家/地区",但是数据库中的国家/地区名称与我需要的输出不匹配.

I’m cleaning a data base, one of the fields is "country" however the country names in my data base do not match the output I need.

我虽然使用了str_replace函数,但是我有50多个国家需要修复,所以这不是最有效的方法.我已经准备了一个CSV文件,其中包含原始的国家/地区输入和我需要参考的输出.

I though of using str_replace function but I have over 50 countries that need to be fix, so it’s not the most efficient way. I already prepared a CSV file with the original country input and the output I need for reference.

这是我到目前为止所拥有的:

Here is what I have so far:

library(stringr)
library(dplyr)
library(tidyr)
library(readxl)
database1<- read_excel("database.xlsx") 
database1$country<str_replace(database1$country,"USA","United States")
database1$country<str_replace(database1$country,"UK","United Kingdom")
database1$country<str_replace(database1$country,"Bolivia","Bolivia,Plurinational State of")
write.csv(database1, "test.csv", row.names=FALSE, fileEncoding = 'UTF 8', na="")

推荐答案

注意: factor 中的级别和标签必须唯一,否则不应包含重复项.

Note: levels and labels inside the factor must be unique or it should not contain duplicates.

# database1 <- read_excel("database.xlsx")  ## read database excel book
old_names <- c("USA", "UGA", "CHL") ## country abbreviations
new_names <- c("United States", "Uganda", "Chile")  ## country full form

基数R

database1 <- within( database1, country <- factor( country, levels = old_names, labels = new_names ))

数据表

library('data.table')
setDT(database1)
database1[, country := factor(country, levels = old_names, labels = new_names)]

database1
#          country
# 1: United States
# 2:        Uganda
# 3:         Chile
# 4: United States
# 5:        Uganda
# 6:         Chile
# 7: United States
# 8:        Uganda
# 9:         Chile

数据

database1 <- data.frame(country = c("USA", "UGA", "CHL", "USA", "UGA", "CHL", "USA", "UGA", "CHL"))
#    country
# 1     USA
# 2     UGA
# 3     CHL
# 4     USA
# 5     UGA
# 6     CHL
# 7     USA
# 8     UGA
# 9     CHL

您可以创建一个命名向量 countries ,而不是两个变量,例如old_names和new_names.

You can create one named vector countries, instead of two variables such as old_names and new_names.

countries <- c("USA", "UGA", "CHL")
names(countries) <- c("United States", "Uganda", "Chile")
within( database1, country <- factor( country, levels = countries, labels = names(countries) ))

这篇关于使用参考表替换多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆