带有插入符号 SVM 警告消息的 Text2Vec 分类 [英] Text2Vec classification with caret SVM warning message
问题描述
我正在使用 text2vec
包和 caret
处理文本分类问题.在使用 caret
构建不同模型之前,我正在使用 text2vec
构建文档术语矩阵.目标是使用标记的训练数据识别两个字符串之间的字符串相似性.
I am working on a text classification problem with the text2vec
package and caret
. I am using text2vec
to build a document-term matrix before building different models with caret
. The goal is to identify string similarity between two strings, using labeled training data.
但是,在训练线性 SVM 模型时,我收到了许多警告消息,摘录如下:
However, when training a linear SVM model, I get a number of warning messages, excerpt below:
警告信息:1:在 svm.default(x = as.matrix(x), y = y, kernel = "linear", ... :
变量流感"和‘培哚普利林达帕胺’和‘比索洛尔hct.1’和‘creon.1’和‘kreon.1’和‘paratramadol.1’常数.无法缩放数据.
Warning messages: 1: In svm.default(x = as.matrix(x), y = y, kernel = "linear", ... :
Variable(s) ‘influenza’ and ‘perindoprilindapamide’ and ‘bisoprololhct.1’ and ‘creon.1’ and ‘kreon.1’ and ‘paratramadol.1’ constant. Cannot scale data.
能否请您帮助我理解这些警告以及如何解决无法扩展数据?
Can you please help me to understand these warnings and how to address Cannot scale data?
原始训练数据的摘录:
ID MAKTX_Keyword PH_Level_04_Keyword Result
266325638 AMLODIPINE AMLODIPINE 0
724712821 IRBESARTANHCTZ IRBESARTANHCTZ 0
567428641 RABEPRAZOLE RABEPRAZOLE 0
137472217 MIRTAZAPINE MIRTAZAPINE 0
175827784 FONDAPARINUX ARIXTRA 1
456372747 VANCOMYCIN VANCOMYCIN 0
653832438 BRUFEN IBUPROFEN 1
917575539 POTASSIUM POTASSIUM 0
222949123 DIOSMINHESPERIDIN DIOSMINHESPERIDIN 0
892725684 IBUPROFEN IBUPROFEN 0
构建 SVM 模型的代码:
Code to build SVM Model:
control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)
Train_PRDHA_String.df$Result <- ifelse(Train_PRDHA_String.df$Result == 1, "X", "Y")
(warn=1)
(warnings=2)
t1 = Sys.time()
svm_Linear <- train(x = as.matrix(dtm_train), y = as.factor(Train_PRDHA_String.df$Result),
method = "svmLinear2",
trControl=control,
tuneLength = 5,
metric ="Accuracy")
print(difftime(Sys.time(), t1, units = 'sec'))
推荐答案
这意味着,当这些变量被重新采样时,它们只有一个唯一值.您可以使用 preProc = "zv"
来消除警告.
It means, when these variables are resampled, they only have one unique value. You can use preProc = "zv"
to get rid of the warning.
这篇关于带有插入符号 SVM 警告消息的 Text2Vec 分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!