如何在矢量中保持每个字符串中唯一的单词 [英] How do keep only unique words within each string in a vector
问题描述
我的数据如下所示:
vector = c(hello我喜欢代码你好,编码是有趣的,有趣的乐趣)
我想删除重复的单词输出应该像
vector_cleaned
[1]你好
[3]fun
在空格处拆分( strsplit
),使用 unique
(在 lapply
)和粘贴
它一起回来:
pre>
vapply(lapply(strsplit(vector,),unique),paste,character(1L),collapse =)
#[1]你好,我喜欢编码编码很有趣有趣
## OR
vapply(strsplit(vector,),function(x)paste(unique(x) collapse =),character(1L))
< >根据评论更新
你可以随时写一个自定义功能用于您的 vapply
功能。例如,这里是一个使用分割字符串的函数,删除比一定数量字符短的字符串,并具有唯一设置作为用户选择。
myFun< - function(x,minLen = 3,onlyUnique = TRUE){
a < - if(isTRUE(onlyUnique))unique(x)else x
贴(a [nchar(a)> minLen],collapse =)
}
vapply(strsplit(vector,), myFun,character(1L))
vapply(strsplit(vector,),myFun,character(1L),onlyUnique = FALSE)
vapply(strsplit(vector,),myFun,character 1L),minLen = 0)
I have data that looks like this:
vector = c("hello I like to code hello","Coding is fun", "fun fun fun")
I want to remove duplicate words (space delimited) i.e. the output should look like
vector_cleaned
[1] "hello I like to code"
[2] "coding is fun"
[3] "fun"
Split it up (strsplit
on spaces), use unique
(in lapply
), and paste
it back together:
vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun" "fun"
## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))
Update based on comments
You can always write a custom function to use with your vapply
function. For instance, here's a function that takes a split string, drops strings that are shorter than a certain number of characters, and has the "unique" setting as a user choice.
myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
a <- if (isTRUE(onlyUnique)) unique(x) else x
paste(a[nchar(a) > minLen], collapse = " ")
}
Compare the output of the following to see how it would work.
vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)
这篇关于如何在矢量中保持每个字符串中唯一的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!