从文本向量r中删除多个模式 [英] remove multiple patterns from text vector r

查看:78
本文介绍了从文本向量r中删除多个模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从多个字符向量中删除多个模式.目前我要去:

I want to remove multiple patterns from multiple character vectors. Currently I am going:

a.vector <- gsub("@\\w+", "", a.vector)
a.vector <- gsub("http\\w+", "", a.vector)
a.vector <- gsub("[[:punct:]], "", a.vector)

这很痛苦.我在看这个问题答案: R:gsub,pattern = vector,replacement = vector ,但是这不是解决问题的方法.

This is painful. I was looking at this question & answer: R: gsub, pattern = vector and replacement = vector but it's not solving the problem.

mapplymgsub均不起作用.我做了这些向量

Neither the mapply nor the mgsub are working. I made these vectors

remove <- c("@\\w+", "http\\w+", "[[:punct:]]")
substitute <- c("")

mapply(gsub, remove, substitute, a.vector)mgsub(remove, substitute, a.vector) worked.

a.vector看起来像这样:

[4951] "@karakamen: Suicide amongst successful men is becoming rampant. Kudos for staing the conversation. #mental"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
[4952] "@stiphan: you are phenomenal.. #mental #Writing. httptxjwufmfg"   

我想要:

[4951] "Suicide amongst successful men is becoming rampant Kudos for staing the conversation #mental"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
[4952] "you are phenomenal #mental #Writing"   `

推荐答案

我知道这个答案很晚才出现,但这是因为我不喜欢手动列出grep函数内部的删除模式(请参阅其他解决方案)这里).我的想法是预先设置模式,将其保留为字符向量,然后使用regex分隔符"|":

I know this answer is late on the scene but it stems from my dislike of having to manually list the removal patterns inside the grep functions (see other solutions here). My idea is to set the patterns beforehand, retain them as a character vector, then paste them (i.e. when "needed") using the regex seperator "|":

library(stringr)

remove <- c("@\\w+", "http\\w+", "[[:punct:]]")

a.vector <- str_remove_all(a.vector, paste(remove, collapse = "|"))

是的,它的确与此处的其他一些答案相同,但是我认为我的解决方案允许您保留原始的字符去除向量" remove.

Yes, this does effectively do the same as some of the other answers here, but I think my solution allows you to retain the original "character removal vector" remove.

这篇关于从文本向量r中删除多个模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆