删除 R 中除撇号和字内破折号之外的标点符号 [英] Removing punctuation except for apostrophes AND intra-word dashes in R
本文介绍了删除 R 中除撇号和字内破折号之外的标点符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我知道如何单独删除标点符号并保留撇号:
I know how to separately remove punctuation and keep apostrophes:
gsub( "[^[:alnum:]']", " ", db$text )
或如何使用 tm 包保留字内破折号:
or how to keep intra-word dashes with the tm package:
removePunctuation(db$text, preserve_intra_word_dashes = TRUE)
但我找不到同时做这两种事情的方法.例如,如果我的原句是:
but I cannot find a way to do both at the same time. For example if my original sentence is:
"Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"
我希望它是:
"Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
当然,会有多余的空格,但我可以稍后删除它们.
Of course, there will be extra white spaces, but I can remove them later.
感谢您的帮助.
推荐答案
使用字符类
gsub("[^[:alnum:]['-]", " ", db$text)
## "Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
这篇关于删除 R 中除撇号和字内破折号之外的标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文