R tm removeWords停用词未删除停用词 [英] R tm removeWords stopwords is not removing stopwords

查看:707
本文介绍了R tm removeWords停用词未删除停用词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R tm程序包,发现删除文本元素的tm_map函数几乎没有对我有用.

I'm using the R tm package, and find that almost none of the tm_map functions that remove elements of text are working for me.

通过工作",我的意思是,例如,我将奔跑:

By 'working' I mean for example, I'll run:

d <- tm_map(d, removeWords, stopwords('english'))

但是当我跑步时

ddtm <- DocumentTermMatrix(d, control = list(
    weighting = weightTfIdf,
    minWordLength = 2))
findFreqTerms(ddtm, 10)

我仍然得到:

[1] the     this

...等等,还有许多其他停用词.

...etc., and a bunch of other stopwords.

我看不到任何错误,表明出了点问题.有谁知道这是什么,以及如何正确执行停用词删除功能,或者诊断出我的问题所在?

I see no error indicating something has gone wrong. Does anyone know what this is, and how to make stopword-removal function correctly, or diagnose what's going wrong for me?

更新

我之前没有发现一个错误:

There is an error earlier up that I didn't catch:

Refreshing GOE props...
---Registering Weka Editors---
Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): com.mckoi.JDBCDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - Warning, not in CLASSPATH?
[KnowledgeFlow] Loading properties and plugins...
[KnowledgeFlow] Initializing KF...

是Weka删除了tm中的停用词,对吗?所以这可能是我的问题吗?

It is Weka that is removing stopwords in tm, right? So this could be my problem?

更新2

来自,此错误似乎无关.这是关于数据库的,而不是停用词.

From this, this error appears to be unrelated. It's about the db, not about stopwords.

推荐答案

没关系,它正在工作.我做了下面的最小示例:

Nevermind, it is working. I did the following minimum example:

data("crude")
crude[[1]]
j <- Corpus(VectorSource(crude[[1]]))
jj <- tm_map(j, removeWords, stopwords('english'))
jj[[1]]

我串联使用了几个tm_map表达式.原来,我删除空格,标点符号等的顺序重新组合了新的停用词.

I had used several tm_map expressions in series. It turned out, the order that I had removed spaces, punctuation, etc, had concatenated new stopwords back in.

这篇关于R tm removeWords停用词未删除停用词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆