如何在R中的数据帧中删除字符串末尾的一两个单词? [英] How to remove a word or two at the end of string in a dataframe in R?

查看:26
本文介绍了如何在R中的数据帧中删除字符串末尾的一两个单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为Country"的行的数据框.例如,当原产国为美国时,条目将列为路易斯安那州 - 美国".我试图去掉最后的-USA",这样它只会说它来自哪个州.

I have a dataframe with a row called "Country". When the country of origin is the United States, the entries are listed as "Louisiana - USA", for example. I am trying to get rid of the "- USA" at the end, so that it will only say which state it came from.

所以,我目前有这样的东西(虽然我的有数千个条目):

So, I have something like this currently (though mine is thousands of entries):

df <- data.frame(ID = 1:4, Country = c("Louisiana - USA", "Canada","France", "Maine - USA"))

我尝试的是以下内容:

for (i in 1:nrow(df)) {
    df$USA[i] <- ifelse(grepl(" USA| États-Unis", df$Country[i]), 1, 0) 
}

index_USA <- which(df$USA == 1)

for (int in index_USA) {
    gsub(" - USA", "", df$Country[int])
}

但是,此代码不起作用.我还尝试使用 stringr 包而不是 gsub.因此,我将最后一个 for 循环替换为:

However, this code is not working. I also tried using the stringr package instead of gsub. So, I replaced the last for loop with:

for (int in index_USA) {
    str_replace_all(df$Country[int], " - USA", "")
}

但这也不起作用.我觉得我犯了一个明显的错误,但我无法弄清楚(也许我需要使用正则表达式?)

But this did not work either. I feel like I'm making an obvious mistake, but I cannot figure it out (perhaps I need to use regex?)

推荐答案

您要删除字符串末尾的 " USA"" États-Unis".所以,你需要

You want to remove " USA" and " États-Unis" at the end of the string. So, you need

df$Country <- sub("\\s+(?:USA|États-Unis)$", "", df$Country)

详情

  • \\s+ - 1 个或多个空白字符
  • (?: - 一个(非捕获)分组结构的开始,匹配两个选项之一:
    • USA - USA 子串
    • | - 或
    • États-Unis - États-Unis 子串
    • \\s+ - 1 or more whitespace chars
    • (?: - start of a (non-capturing) grouping construct, matching either of the two alternatives:
      • USA - USA substring
      • | - or
      • États-Unis - États-Unis substring

      这篇关于如何在R中的数据帧中删除字符串末尾的一两个单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆