轻松替换R中的多个单词; str_replace_all给出两个对象长度不相等的错误 [英] Replace multiple words in R easily; str_replace_all gives error that two objects are not equal lengths

查看:117
本文介绍了轻松替换R中的多个单词; str_replace_all给出两个对象长度不相等的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用str_replace_all用一个一致的字符串(即主持人:)替换许多不同的值(即 Mod, M2, M3, Interviewer)。我正在使用多个不同的类别来执行此操作,因此我想避免将每个唯一值都写出来,因为有很多。

I'm trying to use str_replace_all to replace many different values (i.e. "Mod", "M2", "M3", "Interviewer") with one the consistent string (i.e. "Moderator:"). I'm doing this with multiple different categories, and I want avoid having to write each unique value out as there are a lot.

所以我做了个由所有内容组成的小标题我想使其标准化并读入它的唯一值,然后拉出每一列(为简单起见,仅显示5个,但仅显示2个)以将它们变成向量:

So I made a tibble consisting of all the unique values that I want to make standardized and read it in and then pulled out each column (there are 5 but only 2 shown for simplicity) to make them into vectors:

speak_names <- read_csv("speak_names.csv")
speak_namesMisc <- dplyr::pull(speak_names, Misc)
speak_namesMod <- dplyr::pull(speak_names, Moderator)

对于替换值,我制作了一个与那些长度相等的字符向量以上向量,因为我知道替换和模式的长度必须相等:

For the replacement value, I made a character vector of equal length to those above vectors because I know that the replacement and pattern must be equal lengths:

Misc <- rep("Misc:", 2)
Mod <- rep("Moderator:", 28)

当我使用此代码运行Misc,效果很好:

When I run Misc through with this code, it works just fine:

atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMisc, Misc)

但是当我尝试相同的主持人版本时(即使我尝试在杂项之前运行它),我收到一条错误消息:

But when I try the identical Moderator version (even if I attempt to run it before Misc), I get an error message:

atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMod, 
Mod)

Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement),  :
longer object length is not a multiple of shorter object length

我不知道为什么会收到此错误,因为这个相同的函数得出TRUE:

I don't know why I'm getting this error because this identical function yields TRUE:

identical(length(speak_namesMod), length(Mod))

我正在使用的数据帧长16,244行,如果这对模式或替换有任何影响。我被困住了,试图找出为什么它不起作用和/或另一种不涉及在向量中键入每个字符元素的解决方案。

The dataframe that I'm working with is 16,244 lines long if that makes any difference to the pattern or replacement. I'm stuck and trying to find out why this isn't working and/or another solution that does not involve typing out each character element in the vectors.

谢谢!

推荐答案

library('dplyr') # load the dplyr package
library('stringr') # load the stringr package



这里是我自己的数据集的样本,用于回答您的问题



dput()我的数据给出了

abc<-as.data.frame(
structure(list(Name = c("ME-9_ 005", "ME-9_ 004", "ME-9_ 003", 
                        "ME-9_ 002", "ME-9_ 001", "ME-9_ 000", "ME-8_ 005", "ME-8_ 004", 
                        "ME-8_ 003", "ME-8_ 002", "ME-8_ 001", "ME-8_ 000", "ME-7_ 005", 
                        "ME-7_ 004", "ME-7_ 003", "ME-7_ 002", "ME-7_ 001", "ME-7_ 000"
), Mg = c(0.411058647473409, 0.361611969040526, 0.435757145931429, 
          0.36656632349025, 0.312782034685408, 0.357913661160629, 0.414639893651842, 
          0.460992875568015, 0.554803107534663, 0.418743792959099, 0.499114614445091, 
          0.475374442706501, 0.564660334010035, 0.502678818989733, 0.417617035801997, 
          0.488463005872639, 0.484776757286094, 0.424850010858818),
Al = c(0.575667101719941,  0.586351493923602, 0.574053324307634, 0.628497798862674, 0.552234153060378, 
       0.580547408629286, 1.05746950789483, 1.07094531357244, 1.11340157804305, 
       1.03043684466386, 1.02899468191215, 1.07222457991059, 1.5276908007952, 
       1.66549994904359, 1.43287302441973, 1.37434198093964, 1.55835986529032, 
       1.66902429579112), 
Si = c(0.495188340689301, 0.513374456164654, 
       0.51809643007659, 0.569128515813393, 0.542590350648068, 0.516673370168739, 
       1.72437228079744, 1.59076392020817, 1.77327433861292, 1.76671780355934, 
       1.60625706442694, 1.92449284567535, 3.27248599245035, 3.23739024834759, 
       2.84115179036218, 2.51112086010829, 2.98829002803169, 2.93347114563903
), 
P = c(0.222881184902066, 0.258237982165306, 0.230235867213535, 
      0.262379290809071, 0.230438623604524, 0.238615393939999, 0.260241811918024, 
      0.238785817517132, 0.248589968755681, 0.248270048794532, 0.272489046130942, 
      0.266707140244041, 0.25935282543278, 0.258801008935983, 0.250692297246152, 
      0.246890941447243, 0.277698144829677, 0.274197618349091)), 
row.names = c(NA, 
              -18L), class = c("tbl_df", "tbl", "data.frame")))



这是清理之前我的数据的样子



here is how my data looked before cleaning

head(abc,10)



但是对于您的特定问题,您应该这样做



But for your specific question, you should do

abc$Name <- str_replace_all(
  abc$Name, # column we want to search
  c("001" = "","002" = "","003" = "","004" = "","005" = "","000" = "",
    "-" = " ","_" = "") # each string schould be matched with a replacement
)



此处是我的数据在清理后的样子



here is how my data looked after cleaning

head(abc,10)

I希望对您有帮助

这篇关于轻松替换R中的多个单词; str_replace_all给出两个对象长度不相等的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆