删除R中字符串中的重复单词 [英] Removing duplicate words in a string in R

查看:1444
本文介绍了删除R中字符串中的重复单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只是为了帮助那些刚刚自愿删除他们的问题的人,按照他尝试的代码请求和其他意见。让我们假设他们尝试了这样一个例子:

Just to help someone who's just voluntarily removed their question, following a request for code he tried and other comments. Let's assume they tried something like this:

str <- "How do I best try and try and try and find a way to to improve this code?"
d <- unlist(strsplit(str, split=" "))
paste(d[-which(duplicated(d))], collapse = ' ')

,想学一个更好的方法。那么从字符串中删除重复单词的最好方法是什么?

and wanted to learn a better way. So what is the best way to remove a duplicate word from the string?

推荐答案

如果你仍然对替代解决方案感兴趣,你可以使用独特的,稍微简化你的代码。

If you are still interested in alternate solutions you can use unique which slightly simplifies your code.

paste(unique(d), collapse = ' ')

根据托马斯的评论,您可能想删除标点符号。 R的 gsub 有一些很好的内部模式,可以使用而不是严格的正则表达式。当然,你可以随时指定特定的实例,如果你想做一些更精细的正则表达式。

As per the comment by Thomas, you probably do want to remove punctuation. R's gsub has some nice internal patterns you can use instead of strict regex. Of course you can always specify specific instances if you want to do some more refined regex.

d <- gsub("[[:punct:]]", "", d)

这篇关于删除R中字符串中的重复单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆