带有插入符号 SVM 警告消息的 Text2Vec 分类 [英] Text2Vec classification with caret SVM warning message

查看:50
本文介绍了带有插入符号 SVM 警告消息的 Text2Vec 分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 text2vec 包和 caret 处理文本分类问题.在使用 caret 构建不同模型之前,我正在使用 text2vec 构建文档术语矩阵.目标是使用标记的训练数据识别两个字符串之间的字符串相似性.

I am working on a text classification problem with the text2vec package and caret. I am using text2vec to build a document-term matrix before building different models with caret. The goal is to identify string similarity between two strings, using labeled training data.

但是,在训练线性 SVM 模型时,我收到了许多警告消息,摘录如下:

However, when training a linear SVM model, I get a number of warning messages, excerpt below:

警告信息:1:在 svm.default(x = as.matrix(x), y = y, kernel = "linear", ... :
变量流感"和‘培哚普利林达帕胺’和‘比索洛尔hct.1’和‘creon.1’和‘kreon.1’和‘paratramadol.1’常数.无法缩放数据.

Warning messages: 1: In svm.default(x = as.matrix(x), y = y, kernel = "linear", ... :
Variable(s) ‘influenza’ and ‘perindoprilindapamide’ and ‘bisoprololhct.1’ and ‘creon.1’ and ‘kreon.1’ and ‘paratramadol.1’ constant. Cannot scale data.

能否请您帮助我理解这些警告以及如何解决无法扩展数据?

Can you please help me to understand these warnings and how to address Cannot scale data?

原始训练数据的摘录:

ID          MAKTX_Keyword       PH_Level_04_Keyword   Result 
266325638   AMLODIPINE          AMLODIPINE              0 
724712821   IRBESARTANHCTZ      IRBESARTANHCTZ          0 
567428641   RABEPRAZOLE         RABEPRAZOLE             0 
137472217   MIRTAZAPINE         MIRTAZAPINE             0 
175827784   FONDAPARINUX        ARIXTRA                 1 
456372747   VANCOMYCIN          VANCOMYCIN              0 
653832438   BRUFEN              IBUPROFEN               1 
917575539   POTASSIUM           POTASSIUM               0     
222949123   DIOSMINHESPERIDIN   DIOSMINHESPERIDIN       0 
892725684   IBUPROFEN           IBUPROFEN               0

构建 SVM 模型的代码:

Code to build SVM Model:

control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)

Train_PRDHA_String.df$Result <- ifelse(Train_PRDHA_String.df$Result == 1, "X", "Y")

(warn=1)
(warnings=2)

t1 = Sys.time()
svm_Linear <- train(x = as.matrix(dtm_train), y = as.factor(Train_PRDHA_String.df$Result),
                    method = "svmLinear2",
                    trControl=control,
                    tuneLength = 5,
                    metric ="Accuracy")
print(difftime(Sys.time(), t1, units = 'sec'))

推荐答案

这意味着,当这些变量被重新采样时,它们只有一个唯一值.您可以使用 preProc = "zv" 来消除警告.

It means, when these variables are resampled, they only have one unique value. You can use preProc = "zv" to get rid of the warning.

这篇关于带有插入符号 SVM 警告消息的 Text2Vec 分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆