如何在R中的数据帧中删除字符串末尾的一两个单词? [英] How to remove a word or two at the end of string in a dataframe in R?
问题描述
我有一个名为Country"的行的数据框.例如,当原产国为美国时,条目将列为路易斯安那州 - 美国".我试图去掉最后的-USA",这样它只会说它来自哪个州.
I have a dataframe with a row called "Country". When the country of origin is the United States, the entries are listed as "Louisiana - USA", for example. I am trying to get rid of the "- USA" at the end, so that it will only say which state it came from.
所以,我目前有这样的东西(虽然我的有数千个条目):
So, I have something like this currently (though mine is thousands of entries):
df <- data.frame(ID = 1:4, Country = c("Louisiana - USA", "Canada","France", "Maine - USA"))
我尝试的是以下内容:
for (i in 1:nrow(df)) {
df$USA[i] <- ifelse(grepl(" USA| États-Unis", df$Country[i]), 1, 0)
}
index_USA <- which(df$USA == 1)
for (int in index_USA) {
gsub(" - USA", "", df$Country[int])
}
但是,此代码不起作用.我还尝试使用 stringr 包而不是 gsub.因此,我将最后一个 for 循环替换为:
However, this code is not working. I also tried using the stringr package instead of gsub. So, I replaced the last for loop with:
for (int in index_USA) {
str_replace_all(df$Country[int], " - USA", "")
}
但这也不起作用.我觉得我犯了一个明显的错误,但我无法弄清楚(也许我需要使用正则表达式?)
But this did not work either. I feel like I'm making an obvious mistake, but I cannot figure it out (perhaps I need to use regex?)
推荐答案
您要删除字符串末尾的 " USA"
和 " États-Unis"
.所以,你需要
You want to remove " USA"
and " États-Unis"
at the end of the string. So, you need
df$Country <- sub("\\s+(?:USA|États-Unis)$", "", df$Country)
详情
\\s+
- 1 个或多个空白字符(?:
- 一个(非捕获)分组结构的开始,匹配两个选项之一:USA
-USA
子串|
- 或États-Unis
-États-Unis
子串
\\s+
- 1 or more whitespace chars(?:
- start of a (non-capturing) grouping construct, matching either of the two alternatives:USA
-USA
substring|
- orÉtats-Unis
-États-Unis
substring
这篇关于如何在R中的数据帧中删除字符串末尾的一两个单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!