插入符号反对结果标签:错误:至少一个类级别不是有效的R变量名称 [英] Caret objecting to outcomes labels: Error: At least one of the class levels is not a valid R variable name

查看:862
本文介绍了插入符号反对结果标签:错误:至少一个类级别不是有效的R变量名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

caret给我下面的错误.我正在训练SVM以从一袋单词开始进行预测,但是想使用插入符号来调整C参数,但是:

caret gives me the error below. I'm training a SVM for prediction starting from a bag of words and wanted to use caret to tune the C parameter, however:

bow.model.svm.tune <- train(Training.match ~ ., data = data.frame(
    Training.match = factor(Training.Data.old$Training.match, labels = c('no match', 'match')),
    Text.features.dtm.df) %>%
        filter(Training.Data.old$Data.tipe == 'train'),
    method = 'svmRadial',
    tuneLength = 9,
    preProc = c("center","scale"),
    metric="ROC",
    trControl = trainControl(
        method="repeatedcv",
        repeats = 5,
        summaryFunction = twoClassSummary,
        classProbs = T))    

错误:至少一个类级别不是有效的R变量名;生成类概率时,这将导致错误 因为变量名称将被转换为no.match,match. 请使用可以用作有效R变量名称的因子水平 (请参阅?make.names获得帮助).

Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to no.match, match . Please use factor levels that can be used as valid R variable names (see ?make.names for help).

原始的e1071::svm()函数不会出现问题,因此我认为在调整阶段会出现错误:

The original e1071::svm() function doesn't give problems, therefore I suppose the error arise in the tuning phase:

bow.model.svm.tune <- svm(Training.match ~ ., data = data.frame(
             Training.match = factor(Training.Data.old$Training.match, labels = c('no match', 'match')),
             Text.features.dtm.df) %>%
                 filter(Training.Data.old$Data.tipe == 'train'))

数据只是一个结果因子变量,是TfIdf转换后的单词向量的列表:

The data is simply an outcome factor variable and list of TfIdf transformed words vectors:

'data.frame':   1796 obs. of  1697 variables:
 $ Training.match          : Factor w/ 2 levels "no match","match": 2 1 1 1 1 1 1 1 2 1 ...
 $ azienda                 : num  0.12 0 0 0 0 ...
 $ bus                     : num  0.487 0 0 0 0 ...
 $ locale                  : num  0.275 0 0 0 0 ...
 $ martini                 : num  0.852 0.741 0.947 0.947 0.501 ...
 $ osp                     : num  0.339 0 0 0 0 ...
 $ ospedale                : num  0.0389 0.0676 0.0864 0.0864 0.0915 ...

推荐答案

进行预测时(内部使用train或自己使用predict.train),函数将为每个类别概率创建新的列.如果您的代码需要一个名为"no match"的列,它将不会看到"no.match"(这是data.frame将其转换为的内容),并且将引发错误.

When predicting (internally using train or using predict.train yourself), the functions make new columns for each class probability. If your code expects a column called "no match" it won't see "no.match" (which is what data.frame converts it to) and will throw an error.

这篇关于插入符号反对结果标签:错误:至少一个类级别不是有效的R变量名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆