r :需要由 tm_map() 调用的 content_transformer() 将非字母更改为空格 [英] r : Need content_transformer() called by tm_map() to change non-letters to spaces
问题描述
在下面的代码中,任何匹配/|@| \|")的字符都会被改成空格.
In the following code, any characters matching "/|@| \|") will be changed to a space.
> library(tm)
> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
> docs <- tm_map(docs, toSpace, "/|@| \\|")
什么代码会将所有非字母转换为空格?(下面 xxxxx 的位置是什么.)
What code would transform all non-letters to a space? (What goes where the xxxxx's are below.)
将所有非字母放在一个字符串中是非常困难的......(很长的列表,一些不可打印的,加上转义字符的东西.)所以我正在做与上述相反的事情.
It is very difficult to put all non-letters in a string... (Very long list, some non-printable, plus the escaping characters things.) So I'm doing the opposite of the above.
> toSpace_2 <- content_transformer(function xxxxxxxxxxxxxxxxxxxxxxx))
> docs <- tm_map(docs, toSpace_2,
"a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z")
这需要通过 content_transformer() 函数来完成,以保持文档的完整性.这必须非常简单...
This needs to be done by a content_transformer() function to maintain the integrity of docs. This has to be really simple...
谢谢
推荐答案
它只是一个正则表达式.\W
将匹配任何非单词字符.
It is just a regular expression. \W
will match any non-word characters.
docs <- tm_map(docs, toSpace, "\\W")
这篇关于r :需要由 tm_map() 调用的 content_transformer() 将非字母更改为空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!