删除 R 中除撇号和字内破折号之外的标点符号 [英] Removing punctuation except for apostrophes AND intra-word dashes in R

查看:40
本文介绍了删除 R 中除撇号和字内破折号之外的标点符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道如何单独删除标点符号并保留撇号:

I know how to separately remove punctuation and keep apostrophes:

gsub( "[^[:alnum:]']", " ", db$text )  

或如何使用 tm 包保留字内破折号:

or how to keep intra-word dashes with the tm package:

removePunctuation(db$text, preserve_intra_word_dashes = TRUE)

但我找不到同时做这两种事情的方法.例如,如果我的原句是:

but I cannot find a way to do both at the same time. For example if my original sentence is:

"Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"

我希望它是:

"Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"

当然,会有多余的空格,但我可以稍后删除它们.

Of course, there will be extra white spaces, but I can remove them later.

感谢您的帮助.

推荐答案

使用字符类

gsub("[^[:alnum:]['-]", " ", db$text)

## "Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"

这篇关于删除 R 中除撇号和字内破折号之外的标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆