如何在向量中的每个字符串中只保留唯一的单词 [英] How do keep only unique words within each string in a vector

查看:7
本文介绍了如何在向量中的每个字符串中只保留唯一的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据如下所示:

vector = c("hello I like to code hello","Coding is fun", "fun fun fun")

我想删除重复的单词(空格分隔),即输出应该是这样的

I want to remove duplicate words (space delimited) i.e. the output should look like

vector_cleaned

vector_cleaned

[1] "hello I like to code"
[2] "coding is fun"
[3] "fun"

推荐答案

拆分它(strsplit 对空格),使用 unique(在 lapply),然后将其 paste 重新组合在一起:

Split it up (strsplit on spaces), use unique (in lapply), and paste it back together:

vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun"        "fun"  

## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))

<小时>

根据评论更新

您始终可以编写一个自定义函数来与您的 vapply 函数一起使用.例如,这里有一个函数,它接受一个拆分字符串,删除短于特定字符数的字符串,并将唯一"设置作为用户选择.


Update based on comments

You can always write a custom function to use with your vapply function. For instance, here's a function that takes a split string, drops strings that are shorter than a certain number of characters, and has the "unique" setting as a user choice.

myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
  a <- if (isTRUE(onlyUnique)) unique(x) else x
  paste(a[nchar(a) > minLen], collapse = " ")
}

比较下面的输出,看看它是如何工作的.

Compare the output of the following to see how it would work.

vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)

这篇关于如何在向量中的每个字符串中只保留唯一的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆