找不到R对象中的randomForest错误 [英] randomForest in R object not found error

查看：536 发布时间：2020/5/4 10:27:28 r machine-learning classification text-mining random-forest

本文介绍了找不到R对象中的randomForest错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

# init
libs <- c("tm", "plyr", "class", "RTextTools", "randomForest")
lapply(libs, require, character.only = TRUE)

# set options
options(stringsAsFactors = FALSE)

# set parameters
labels <- read.table('labels.txt')
path <- paste(getwd(), "/data", sep="")

# clean text
cleanCorpus <- function(corpus) {
  corpus.tmp <- tm_map(corpus, removePunctuation)
  corpus.tmp <- tm_map(corpus.tmp, removeNumbers)
  corpus.tmp <- tm_map(corpus.tmp, stripWhitespace)
  corpus.tmp <- tm_map(corpus.tmp, content_transformer(tolower))
  corpus.tmp <- tm_map(corpus.tmp, stemDocument, language = "english")
  corpus.tmp <- tm_map(corpus.tmp, removeWords, stopwords("english"))
  return(corpus.tmp)
}

# build TDM
generateTDM <- function(label, path) {
  s.dir <- sprintf("%s/%s", path, label)
  s.cor <- Corpus(DirSource(directory = s.dir), readerControl = list(language = "en"))
  s.cor.cl <- cleanCorpus(s.cor)
  s.tdm <- TermDocumentMatrix(s.cor.cl)
  s.tdm <- removeSparseTerms(s.tdm, 0.7)
  return(list(name = label, tdm = s.tdm))
}

tdm <- lapply(labels, generateTDM, path = path)

# attach name
bindLabelToTDM <- function(tdm) {
  s.mat <- t(data.matrix(tdm[["tdm"]]))
  s.df <- as.data.frame(s.mat, stringsAsFactors = FALSE)
  s.df <- cbind(s.df, rep(tdm[["name"]], nrow(s.df)), row.names = NULL)
  colnames(s.df)[ncol(s.df)] <- "targetlabel"
  return(s.df) 
}

labelTDM <- lapply(tdm, bindLabelToTDM)

# stack
tdm.stack <- do.call(rbind.fill, labelTDM)
tdm.stack[is.na(tdm.stack)] <- 0

# hold-out
train.idx <- sample(nrow(tdm.stack), ceiling(nrow(tdm.stack) * 0.7))
test.idx <- (1:nrow(tdm.stack)) [- train.idx]

tdm.lab <- tdm.stack[, "targetlabel"]
tdm.stack.nl <- tdm.stack[, !colnames(tdm.stack) %in% "targetlabel"]

train <- tdm.stack[train.idx, ]
test <- tdm.stack[test.idx, ]

train$targetlabel <- as.factor(train$targetlabel)
label.rf <- randomForest(targetlabel ~ ., data = train, ntree = 5000, mtry = 15, importance = TRUE)

我正在尝试使用randomForest算法对文本文件进行多类分类.我得到的错误可能是由于最后一行或倒数第二行.

I am trying multi class classfication for text files using randomForest algorithms. The error I get is probably because of the last or second last line.

Error in eval(expr, envir, enclos) : object 'âˆ—' not found

tdm.stack包含名称在文档中找到的单词作为名称和其频率作为其单元格值的列.最后一列包含类的值.

tdm.stack contains columns with names as words found in the document and their cell values as their frequency. The last column contains the class value.

我已经尝试了所有无法解决的问题.请帮忙.

I have tried everything I cant figure out the problem. Please help.

找不到R对象中的randomForest错误 [英] randomForest in R object not found error

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

找不到R对象中的randomForest错误 [英] randomForest in R object not found error

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭