如何立即纠正R中的拼写错误列表 [英] How to correct list of mispellings at once in R

查看:61
本文介绍了如何立即纠正R中的拼写错误列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个完整的拼写错误列表,我想一次更改所有内容.有没有一种简便的方法,而无需编写大量的ifelse语句?

I have a whole list of misspelling and I would like to change the all in one go. Is there an easy way to do so without writing a massive ifelse statement?

vegas <-  c("North Las Vegas","N Las Vegas", "LAS VEGAS", "Las vegas","N. Las Vegas", "las vegas", "Las  Vegas", "Las Vegas ", "South Las Vegas", "La Vegas", "Las Vegas, NV", "LasVegas",
"110 Las Vegas", "C Las Vegas", "Henderson and Las vegas",
"las Vegas", "Las Vegas & Henderson", "Las Vegas East", "Las Vegas Nevada", 
"Las Vegas NV", "Las Vegas Valley", "Las Vegas,", "Las Vegass", 
"Las Vergas", "Los Vegas", "N E Las Vegas", "N W Las Vegas", "NORTH LAS VEGAS", "North Las Vegas ", "Vegas")

data <- structure(list(city = c("Las Vegas", "Henderson", "North Las Vegas", 
"Boulder City", "N Las Vegas", "Paradise", "LAS VEGAS", "Nellis AFB", 
"Las vegas", "Blue Diamond", "N. Las Vegas", "Summerlin", "Spring Valley", 
"HENDERSON", "las vegas", "Enterprise", "Las  Vegas", "Clark", 
"Las Vegas ", "Nellis Air Force Base", "South Las Vegas", "henderson", 
"Nellis Afb", "La Vegas", "Las Vegas, NV", "LasVegas", "Summerlin South", 
"110 Las Vegas", "Black Rock City", "boulder city", "C Las Vegas", 
"Centennial Hills", "Central Henderson", "Citibank", "City Center", 
"Decatur", "Green Valley", "Henderson (Green Valley)", "Henderson and Las vegas", 
"Henderston", "Hendserson", "Hnederson", "Lake Las Vegas", "Lake Mead", 
"las Vegas", "Las Vegas & Henderson", "Las Vegas East", "Las Vegas Nevada", 
"Las Vegas NV", "Las Vegas Valley", "Las Vegas,", "Las Vegass", 
"Las Vergas", "Los Vegas", "N E Las Vegas", "N W Las Vegas", 
"Nellis", "NELLIS AFB", "Nevada", "NORTH LAS VEGAS", "North Las Vegas ", 
"Pahrump", "Seven Hills", "Sunrise", "Sunrise Manor", "Vegas", 
"W Henderson", "W Spring Valley", "Whitney"), count = c(29361L, 
4892L, 1547L, 269L, 26L, 24L, 19L, 16L, 14L, 12L, 12L, 11L, 9L, 
8L, 8L, 7L, 5L, 4L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, -69L), class = c("tbl_df", 
"tbl", "data.frame"))

因此,在每个拼写错误的行中,正确拼写的是拉斯维加斯" .

So correct spelling in each mispelled row to "Las Vegas".

推荐答案

以下是与提议的 mgsub 方法(具有基本R函数)非常相似的解决方案(也许您可能想添加拉斯维加斯湖到您的列表):

Below is a solution very similar to the proposed mgsub approach (with base R functions) (perhaps you might want to add Lake Las Vegas to your list):

vegas <-  c("North Las Vegas","N Las Vegas", "LAS VEGAS", "Las vegas","N. Las Vegas", "las vegas", "Las  Vegas", "Las Vegas ", "South Las Vegas", "La Vegas", "Las Vegas, NV", "LasVegas",
    "110 Las Vegas", "C Las Vegas", "Henderson and Las vegas",
    "las Vegas", "Las Vegas & Henderson", "Las Vegas East", "Las Vegas Nevada", 
    "Las Vegas NV", "Las Vegas Valley", "Las Vegas,", "Las Vegass", 
    "Las Vergas", "Los Vegas", "N E Las Vegas", "N W Las Vegas", "NORTH LAS VEGAS", "North Las Vegas ", "Vegas")

data <- structure(list(city = c("Las Vegas", "Henderson", "North Las Vegas", 
    "Boulder City", "N Las Vegas", "Paradise", "LAS VEGAS", "Nellis AFB", 
    "Las vegas", "Blue Diamond", "N. Las Vegas", "Summerlin", "Spring Valley", 
    "HENDERSON", "las vegas", "Enterprise", "Las  Vegas", "Clark", 
    "Las Vegas ", "Nellis Air Force Base", "South Las Vegas", "henderson", 
    "Nellis Afb", "La Vegas", "Las Vegas, NV", "LasVegas", "Summerlin South", 
    "110 Las Vegas", "Black Rock City", "boulder city", "C Las Vegas", 
    "Centennial Hills", "Central Henderson", "Citibank", "City Center", 
    "Decatur", "Green Valley", "Henderson (Green Valley)", "Henderson and Las vegas", 
    "Henderston", "Hendserson", "Hnederson", "Lake Las Vegas", "Lake Mead", 
    "las Vegas", "Las Vegas & Henderson", "Las Vegas East", "Las Vegas Nevada", 
    "Las Vegas NV", "Las Vegas Valley", "Las Vegas,", "Las Vegass", 
    "Las Vergas", "Los Vegas", "N E Las Vegas", "N W Las Vegas", 
    "Nellis", "NELLIS AFB", "Nevada", "NORTH LAS VEGAS", "North Las Vegas ", 
    "Pahrump", "Seven Hills", "Sunrise", "Sunrise Manor", "Vegas", 
    "W Henderson", "W Spring Valley", "Whitney"), count = c(29361L, 
        4892L, 1547L, 269L, 26L, 24L, 19L, 16L, 14L, 12L, 12L, 11L, 9L, 
        8L, 8L, 7L, 5L, 4L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 
        1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
        1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
        1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, -69L), class = c("tbl_df", 
            "tbl", "data.frame"))

## function that takes list with two elements and replaces first with second
multisub <- function(replacement.list, string, ...) {
    mygsub <- function(l, x) gsub(pattern = l[1], replacement = l[2], x, ...)
    Reduce(mygsub, replacement.list, init = string, right = TRUE)
}

## make sure the matches correspond to entire string by adding delimiters
vegas <- paste0("^", vegas, "$")

## generate replacement list
mylist <- unlist(apply(cbind(vegas, rep("Las Vegas", length(vegas))), 1, list), recursive = FALSE)

## perform multiple replacement
data$city_replaced <- multisub(mylist, data$city)
data
#>                        city count            city_replaced
#> 1                 Las Vegas 29361                Las Vegas
#> 2                 Henderson  4892                Henderson
#> 3           North Las Vegas  1547                Las Vegas
#> 4              Boulder City   269             Boulder City
#> 5               N Las Vegas    26                Las Vegas
#> 6                  Paradise    24                 Paradise
#> 7                 LAS VEGAS    19                Las Vegas
#> 8                Nellis AFB    16               Nellis AFB
#> 9                 Las vegas    14                Las Vegas
#> 10             Blue Diamond    12             Blue Diamond
#> 11             N. Las Vegas    12                Las Vegas
#> 12                Summerlin    11                Summerlin
#> 13            Spring Valley     9            Spring Valley
#> 14                HENDERSON     8                HENDERSON
#> 15                las vegas     8                Las Vegas
#> 16               Enterprise     7               Enterprise
#> 17               Las  Vegas     5                Las Vegas
#> 18                    Clark     4                    Clark
#> 19               Las Vegas      4                Las Vegas
#> 20    Nellis Air Force Base     4    Nellis Air Force Base
#> 21          South Las Vegas     4                Las Vegas
#> 22                henderson     3                henderson
#> 23               Nellis Afb     3               Nellis Afb
#> 24                 La Vegas     2                Las Vegas
#> 25            Las Vegas, NV     2                Las Vegas
#> 26                 LasVegas     2                Las Vegas
#> 27          Summerlin South     2          Summerlin South
#> 28            110 Las Vegas     1                Las Vegas
#> 29          Black Rock City     1          Black Rock City
#> 30             boulder city     1             boulder city
#> 31              C Las Vegas     1                Las Vegas
#> 32         Centennial Hills     1         Centennial Hills
#> 33        Central Henderson     1        Central Henderson
#> 34                 Citibank     1                 Citibank
#> 35              City Center     1              City Center
#> 36                  Decatur     1                  Decatur
#> 37             Green Valley     1             Green Valley
#> 38 Henderson (Green Valley)     1 Henderson (Green Valley)
#> 39  Henderson and Las vegas     1                Las Vegas
#> 40               Henderston     1               Henderston
#> 41               Hendserson     1               Hendserson
#> 42                Hnederson     1                Hnederson
#> 43           Lake Las Vegas     1           Lake Las Vegas
#> 44                Lake Mead     1                Lake Mead
#> 45                las Vegas     1                Las Vegas
#> 46    Las Vegas & Henderson     1                Las Vegas
#> 47           Las Vegas East     1                Las Vegas
#> 48         Las Vegas Nevada     1                Las Vegas
#> 49             Las Vegas NV     1                Las Vegas
#> 50         Las Vegas Valley     1                Las Vegas
#> 51               Las Vegas,     1                Las Vegas
#> 52               Las Vegass     1                Las Vegas
#> 53               Las Vergas     1                Las Vegas
#> 54                Los Vegas     1                Las Vegas
#> 55            N E Las Vegas     1                Las Vegas
#> 56            N W Las Vegas     1                Las Vegas
#> 57                   Nellis     1                   Nellis
#> 58               NELLIS AFB     1               NELLIS AFB
#> 59                   Nevada     1                   Nevada
#> 60          NORTH LAS VEGAS     1                Las Vegas
#> 61         North Las Vegas      1                Las Vegas
#> 62                  Pahrump     1                  Pahrump
#> 63              Seven Hills     1              Seven Hills
#> 64                  Sunrise     1                  Sunrise
#> 65            Sunrise Manor     1            Sunrise Manor
#> 66                    Vegas     1                Las Vegas
#> 67              W Henderson     1              W Henderson
#> 68          W Spring Valley     1          W Spring Valley
#> 69                  Whitney     1                  Whitney

reprex软件包(v0.3.0)创建于2020-03-10 sup>

Created on 2020-03-10 by the reprex package (v0.3.0)

修改:使用上述方法,您可以追加多个替换列表并立即替换它们.它还允许部分匹配,尽管我们在这里使用 vegas<-paste0("^",vegas,"$")明确将其关闭.

Edit: With the above approach you can append multiple replacement lists and replace them at once. It also allows partial matching, although we have explicitly turned it off here using vegas <- paste0("^", vegas, "$").

如果您只有一个城市并列出了其他拼写形式,则也可以简单地将它们匹配并替换(使用原始的 data data.frame和 vegas 向量):

If you have just one city and a list of alternative spellings, you could also simply match them up and replace them (using your original data data.frame and vegas vector):

data$city[data$city %in% vegas] <- "Las Vegas"

这篇关于如何立即纠正R中的拼写错误列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆