轻松替换R中的多个单词; str_replace_all给出两个对象长度不相等的错误 [英] Replace multiple words in R easily; str_replace_all gives error that two objects are not equal lengths
问题描述
我正在尝试使用str_replace_all用一个一致的字符串(即主持人:)替换许多不同的值(即 Mod, M2, M3, Interviewer)。我正在使用多个不同的类别来执行此操作,因此我想避免将每个唯一值都写出来,因为有很多。
I'm trying to use str_replace_all to replace many different values (i.e. "Mod", "M2", "M3", "Interviewer") with one the consistent string (i.e. "Moderator:"). I'm doing this with multiple different categories, and I want avoid having to write each unique value out as there are a lot.
所以我做了个由所有内容组成的小标题我想使其标准化并读入它的唯一值,然后拉出每一列(为简单起见,仅显示5个,但仅显示2个)以将它们变成向量:
So I made a tibble consisting of all the unique values that I want to make standardized and read it in and then pulled out each column (there are 5 but only 2 shown for simplicity) to make them into vectors:
speak_names <- read_csv("speak_names.csv")
speak_namesMisc <- dplyr::pull(speak_names, Misc)
speak_namesMod <- dplyr::pull(speak_names, Moderator)
对于替换值,我制作了一个与那些长度相等的字符向量以上向量,因为我知道替换和模式的长度必须相等:
For the replacement value, I made a character vector of equal length to those above vectors because I know that the replacement and pattern must be equal lengths:
Misc <- rep("Misc:", 2)
Mod <- rep("Moderator:", 28)
当我使用此代码运行Misc,效果很好:
When I run Misc through with this code, it works just fine:
atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMisc, Misc)
但是当我尝试相同的主持人版本时(即使我尝试在杂项之前运行它),我收到一条错误消息:
But when I try the identical Moderator version (even if I attempt to run it before Misc), I get an error message:
atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMod,
Mod)
Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
longer object length is not a multiple of shorter object length
我不知道为什么会收到此错误,因为这个相同的函数得出TRUE:
I don't know why I'm getting this error because this identical function yields TRUE:
identical(length(speak_namesMod), length(Mod))
我正在使用的数据帧长16,244行,如果这对模式或替换有任何影响。我被困住了,试图找出为什么它不起作用和/或另一种不涉及在向量中键入每个字符元素的解决方案。
The dataframe that I'm working with is 16,244 lines long if that makes any difference to the pattern or replacement. I'm stuck and trying to find out why this isn't working and/or another solution that does not involve typing out each character element in the vectors.
谢谢!
推荐答案
library('dplyr') # load the dplyr package
library('stringr') # load the stringr package
这里是我自己的数据集的样本,用于回答您的问题
dput()
我的数据给出了
abc<-as.data.frame(
structure(list(Name = c("ME-9_ 005", "ME-9_ 004", "ME-9_ 003",
"ME-9_ 002", "ME-9_ 001", "ME-9_ 000", "ME-8_ 005", "ME-8_ 004",
"ME-8_ 003", "ME-8_ 002", "ME-8_ 001", "ME-8_ 000", "ME-7_ 005",
"ME-7_ 004", "ME-7_ 003", "ME-7_ 002", "ME-7_ 001", "ME-7_ 000"
), Mg = c(0.411058647473409, 0.361611969040526, 0.435757145931429,
0.36656632349025, 0.312782034685408, 0.357913661160629, 0.414639893651842,
0.460992875568015, 0.554803107534663, 0.418743792959099, 0.499114614445091,
0.475374442706501, 0.564660334010035, 0.502678818989733, 0.417617035801997,
0.488463005872639, 0.484776757286094, 0.424850010858818),
Al = c(0.575667101719941, 0.586351493923602, 0.574053324307634, 0.628497798862674, 0.552234153060378,
0.580547408629286, 1.05746950789483, 1.07094531357244, 1.11340157804305,
1.03043684466386, 1.02899468191215, 1.07222457991059, 1.5276908007952,
1.66549994904359, 1.43287302441973, 1.37434198093964, 1.55835986529032,
1.66902429579112),
Si = c(0.495188340689301, 0.513374456164654,
0.51809643007659, 0.569128515813393, 0.542590350648068, 0.516673370168739,
1.72437228079744, 1.59076392020817, 1.77327433861292, 1.76671780355934,
1.60625706442694, 1.92449284567535, 3.27248599245035, 3.23739024834759,
2.84115179036218, 2.51112086010829, 2.98829002803169, 2.93347114563903
),
P = c(0.222881184902066, 0.258237982165306, 0.230235867213535,
0.262379290809071, 0.230438623604524, 0.238615393939999, 0.260241811918024,
0.238785817517132, 0.248589968755681, 0.248270048794532, 0.272489046130942,
0.266707140244041, 0.25935282543278, 0.258801008935983, 0.250692297246152,
0.246890941447243, 0.277698144829677, 0.274197618349091)),
row.names = c(NA,
-18L), class = c("tbl_df", "tbl", "data.frame")))
这是清理之前我的数据的样子
here is how my data looked before cleaning
head(abc,10)
但是对于您的特定问题,您应该这样做
But for your specific question, you should do
abc$Name <- str_replace_all(
abc$Name, # column we want to search
c("001" = "","002" = "","003" = "","004" = "","005" = "","000" = "",
"-" = " ","_" = "") # each string schould be matched with a replacement
)
此处是我的数据在清理后的样子
here is how my data looked after cleaning
head(abc,10)
I希望对您有帮助
这篇关于轻松替换R中的多个单词; str_replace_all给出两个对象长度不相等的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!