将嵌有其他文本的长状态名转换为双字母状态缩写 [英] Convert long state names embedded with other text to two-letter state abbreviations

查看:159
本文介绍了将嵌有其他文本的长状态名转换为双字母状态缩写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是识别美国国家写出的字符矢量,其中包含其他文本并将状态转换为缩略形式。例如,北卡罗来纳州到NC。如果矢量只有长形式的名称,这很简单。但是,我的向量在随机地方有其他文本,如在示例states中。

  states <-c( Plano New Jersey,NC,xyz,Alabama 02138,Texas,Town Iowa 99999)

从另一篇文章中,我发现了这个:

$ $ $ $ $ $ $ $ state.abb [match(states,state.name) ]

但它只转换独立的Texas

 > state.abb [match(states,state.name)] 
[1]不适用不适用TX

,而不是新泽西州,阿拉巴马州和爱荷华州的字符串。 使用向量化模式或匹配的快速grep,返回所有匹配的列表我试过:

  sapply(states,grep(pattern = state.name,x = states,value = TRUE))

但是

  get(as.character(FUN),mode =function,envir = envir)中的错误:
模式'function'的对象'Alabama 02138'未找到
另外:警告消息:
在grep中(pattern = state.name,x = states,value = TRUE):
参数'pattern'的长度> 1,只有第一个元素会被使用

这也不会起作用:

  sapply(states,function(x)state.abb [grep(state.name,states)])

这个问题没有帮助:
正则表达式将状态名称转换为缩写



如何将嵌入的长名称转换为状态缩写吗?

编辑:我想返回矢量,唯一的变化就是缩短了州名的长名,例如Plano New Jersey变成了 Plano NJ。



感谢您纠正和/或教育我。

解决方案



  library(qdap)
mgsub(state.name,state.abb,州)

## [1]Plano NJNCxyzAL 02138
##TXTown IA 99999

如果您不确定州将会大写,你可能想要使用:

$ $ $ $ $ $ $ $ mgsub(state.name,state.abb,states,ignore.case = TRUE ,fixed = FALSE)


My objective is to identify US states written out in a character vector that has other text and convert the states to abbreviated form. For example, "North Carolina" to "NC". It is simple if the vector only has long-form state names. However, my vector has other text in random places, as in the example "states".

states <- c("Plano New Jersey", "NC", "xyz", "Alabama 02138", "Texas", "Town Iowa 99999")

From another post I found this:

state.abb[match(states, state.name)]

but it converts only the standalone Texas

> state.abb[match(states, state.name)]
[1] NA   NA   NA   NA   "TX"

and not the New Jersey, Alabama and Iowa strings.

From Fast grep with a vectored pattern or match, to return list of all matches I tried:

sapply(states, grep(pattern = state.name, x = states, value = TRUE))

but

Error in get(as.character(FUN), mode = "function", envir = envir) : 
  object 'Alabama 02138' of mode 'function' was not found
In addition: Warning message:
In grep(pattern = state.name, x = states, value = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used

Nor does this work:

sapply(states, function(x) state.abb[grep(state.name, states)])

This question did not help: regular expression to convert state names to abbreviations

How do I convert the embedded long names to the state abbreviation?

EDIT: I want to return the vector with the only change being that the long names of the states have been abbreviated, e.g., "Plano New Jersey" becomes "Plano NJ".

Thank you for correcting and/or educating me.

解决方案

Here's another approach:

library(qdap)
mgsub(state.name, state.abb, states)

## [1] "Plano NJ"      "NC"            "xyz"           "AL 02138"      
## "TX"            "Town IA 99999"

If you are uncertain that the states will be capitalized you may want to use:

mgsub(state.name, state.abb, states, ignore.case=TRUE, fixed=FALSE)

这篇关于将嵌有其他文本的长状态名转换为双字母状态缩写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆