替换R中数据帧变量中的特定字符 [英] Replace specific characters in a variable in data frame in R

查看:169
本文介绍了替换R中数据帧变量中的特定字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要替换所有 - (空格)与在示例数据框架中的变量DMA.NAME我提到了三个帖子并尝试了他们的方法,但都失败了:

I want to replace all ,, -, ), ( and (space) with . from the variable DMA.NAME in the example data frame. I referred to three posts and tried their approaches but all failed.:

替换不包含在列表中的数据框中的列值

R替换所有特定值数据框

从数据框R列中替换字符R

方法1

> shouldbecomeperiod <- c$DMA.NAME %in% c("-", ",", " ", "(", ")")
c$DMA.NAME[shouldbecomeperiod] <- "."

方法2

> removetext <- c("-", ",", " ", "(", ")")
c$DMA.NAME <- gsub(removetext, ".", c$DMA.NAME)
c$DMA.NAME <- gsub(removetext, ".", c$DMA.NAME, fixed = TRUE)

Warning message:
In gsub(removetext, ".", c$DMA.NAME) :
  argument 'pattern' has length > 1 and only the first element will be used

方法3

> c[c == c(" ", ",", "(", ")", "-")] <- "."

样本数据框

> df
DMA.CODE                  DATE                   DMA.NAME       count
111         22 8/14/2014 12:00:00 AM               Columbus, OH     1
112         23 7/15/2014 12:00:00 AM Orlando-Daytona Bch-Melbrn     1
79          18 7/30/2014 12:00:00 AM        Boston (Manchester)     1
99          22 8/20/2014 12:00:00 AM               Columbus, OH     1
112.1       23 7/15/2014 12:00:00 AM Orlando-Daytona Bch-Melbrn     1
208         27 7/31/2014 12:00:00 AM       Minneapolis-St. Paul     1

我知道问题 - gsub 使用模式和唯一的第一个元素。其他两种方法是搜索整个变量的确切值,而不是搜索特定字符的值。

I know the problem - gsub uses pattern and only first element . The other two approaches are searching the entire variable for the exact value instead of searching within value for specific characters.

推荐答案

您可以使用特殊组 [:punct:] 一个模式组( [...] )内的 [:space:] )这样:

You can use the special groups [:punct:] and [:space:] inside of a pattern group ([...]) like this:

df <- data.frame(
  DMA.NAME = c(
    "Columbus, OH",
    "Orlando-Daytona Bch-Melbrn",
    "Boston (Manchester)",
    "Columbus, OH",
    "Orlando-Daytona Bch-Melbrn",
    "Minneapolis-St. Paul"),
  stringsAsFactors=F)
##
> gsub("[[:punct:][:space:]]+","\\.",df$DMA.NAME)
[1] "Columbus.OH"                "Orlando.Daytona.Bch.Melbrn" "Boston.Manchester."         "Columbus.OH"               
[5] "Orlando.Daytona.Bch.Melbrn" "Minneapolis.St.Paul"

这篇关于替换R中数据帧变量中的特定字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆