从文本向量r中删除多个模式 [英] remove multiple patterns from text vector r
问题描述
我想从多个字符向量中删除多个模式.目前我要去:
I want to remove multiple patterns from multiple character vectors. Currently I am going:
a.vector <- gsub("@\\w+", "", a.vector)
a.vector <- gsub("http\\w+", "", a.vector)
a.vector <- gsub("[[:punct:]], "", a.vector)
等
这很痛苦.我在看这个问题答案: R:gsub,pattern = vector,replacement = vector ,但是这不是解决问题的方法.
This is painful. I was looking at this question & answer: R: gsub, pattern = vector and replacement = vector but it's not solving the problem.
mapply
和mgsub
均不起作用.我做了这些向量
Neither the mapply
nor the mgsub
are working. I made these vectors
remove <- c("@\\w+", "http\\w+", "[[:punct:]]")
substitute <- c("")
mapply(gsub, remove, substitute, a.vector)
和mgsub(remove, substitute, a.vector) worked.
a.vector
看起来像这样:
[4951] "@karakamen: Suicide amongst successful men is becoming rampant. Kudos for staing the conversation. #mental"
[4952] "@stiphan: you are phenomenal.. #mental #Writing. httptxjwufmfg"
我想要:
[4951] "Suicide amongst successful men is becoming rampant Kudos for staing the conversation #mental"
[4952] "you are phenomenal #mental #Writing" `
推荐答案
我知道这个答案很晚才出现,但这是因为我不喜欢手动列出grep
函数内部的删除模式(请参阅其他解决方案)这里).我的想法是预先设置模式,将其保留为字符向量,然后使用regex
分隔符"|"
:
I know this answer is late on the scene but it stems from my dislike of having to manually list the removal patterns inside the grep
functions (see other solutions here). My idea is to set the patterns beforehand, retain them as a character vector, then paste them (i.e. when "needed") using the regex
seperator "|"
:
library(stringr)
remove <- c("@\\w+", "http\\w+", "[[:punct:]]")
a.vector <- str_remove_all(a.vector, paste(remove, collapse = "|"))
是的,它的确与此处的其他一些答案相同,但是我认为我的解决方案允许您保留原始的字符去除向量" remove
.
Yes, this does effectively do the same as some of the other answers here, but I think my solution allows you to retain the original "character removal vector" remove
.
这篇关于从文本向量r中删除多个模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!