如何在矢量中保持每个字符串中唯一的单词 [英] How do keep only unique words within each string in a vector

查看:131
本文介绍了如何在矢量中保持每个字符串中唯一的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据如下所示:

  vector = c(hello我喜欢代码你好,编码是有趣的,有趣的乐趣)

我想删除重复的单词输出应该像



vector_cleaned

  [1]你好
[3]fun


解决方案

在空格处拆分( strsplit ),使用 unique (在 lapply )和粘贴它一起回来:



pre> vapply(lapply(strsplit(vector,),unique),paste,character(1L),collapse =)
#[1]你好,我喜欢编码编码很有趣有趣

## OR
vapply(strsplit(vector,),function(x)paste(unique(x) collapse =),character(1L))






< >根据评论更新

你可以随时写一个自定义功能用于您的 vapply 功能。例如,这里是一个使用分割字符串的函数,删除比一定数量字符短的字符串,并具有唯一设置作为用户选择。

  myFun<  -  function(x,minLen = 3,onlyUnique = TRUE){
a < - if(isTRUE(onlyUnique))unique(x)else x
贴(a [nchar(a)> minLen],collapse =)
}



  vapply(strsplit(vector,), myFun,character(1L))
vapply(strsplit(vector,),myFun,character(1L),onlyUnique = FALSE)
vapply(strsplit(vector,),myFun,character 1L),minLen = 0)


I have data that looks like this:

vector = c("hello I like to code hello","Coding is fun", "fun fun fun")

I want to remove duplicate words (space delimited) i.e. the output should look like

vector_cleaned

[1] "hello I like to code"
[2] "coding is fun"
[3] "fun"

解决方案

Split it up (strsplit on spaces), use unique (in lapply), and paste it back together:

vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun"        "fun"  

## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))


Update based on comments

You can always write a custom function to use with your vapply function. For instance, here's a function that takes a split string, drops strings that are shorter than a certain number of characters, and has the "unique" setting as a user choice.

myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
  a <- if (isTRUE(onlyUnique)) unique(x) else x
  paste(a[nchar(a) > minLen], collapse = " ")
}

Compare the output of the following to see how it would work.

vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)

这篇关于如何在矢量中保持每个字符串中唯一的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆