测试 SVM 分类器进行文本分类时出错 [英] Error in testing SVM classifier for text classification

查看：72 发布时间：2021/7/16 20:10:51 python svm scikit-learn

本文介绍了测试 SVM 分类器进行文本分类时出错的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经阅读了 sklearn 文档，并编写了用于训练 SVM 分类器以及测试它的代码.但是，在最后一步，我收到了一个我无法理解的错误.我的代码如下:

I have gone throught the sklearn documentation, and written code for training SVM classifier as well as testing it. However, at the end step, I am getting an error that I cant comprehend. My code is as below:

rb = open_workbook('subjectcat.xlsx')#C:/Users/5460/Desktop/
wb = copy(rb) #making a copy
sheet = rb.sheet_by_index(0)

#only subjects extracted from excel file     
train_set = () #list
for row_index in range(1,500): #train using 500
    subject = 0
    for col_index in range(1,2):        
        if col_index==1:
            subject = sheet.cell(row_index,col_index).value
            subject = "'" + subject
            train_set = train_set + (subject,)

print 'only subjects'
train = list(train_set)
print len(train_set)
#for t in train_set:
#    print t

vectorizer = TfidfVectorizer(min_df=1) #Tf-idf and CountVector
#extracting features from training data
#corpus = set(train_set)  -- was reducing len to 468
corpus = (train_set)
print len(corpus)
x = vectorizer.fit_transform(corpus)
feature_names = vectorizer.get_feature_names() #use this for toarray() later -- this is to interpret for user
#print feature_names

x_array = x.toarray()
print x_array
print type(x_array)
print len(x_array)

#converting to numpy 2D array
data_array = np.array(x_array)
print type(data_array)
print len(data_array)
print data_array

#only categories extracted from excel file     
cat_set = () #list
for row_index in range(1,500): #train using 500
    subject = 0
    for col_index in range(2,4):        
        if col_index==3:
            category = sheet.cell(row_index,col_index).value
            #in numerical form
            catgory = int(category)
            cat_set = cat_set + (category,)

#for c in cat_set:
#    print c
print 'only categories'
cat_set = list(cat_set)
print len(cat_set)
cat_array = np.array(cat_set)
print cat_array
print type(cat_array)

#################################################################

#data for testing
#only subjects extracted from excel file     
test_set = () #list
for row_index in range(500,575): #train using 500
    subject = 0
    for col_index in range(1,2):        
        if col_index==1:
            subject = sheet.cell(row_index,col_index).value
            subject = "'" + subject
            test_set = test_set + (subject,)

print 'only testing subjects'
test = list(test_set)
print len(test_set)

#extracting features from testing data
test_corpus = (test_set)
print len(test_corpus)
y = vectorizer.fit_transform(test_corpus)
#feature_names = vectorizer.get_feature_names() #use this for toarray() later -- this is to interpret for user

y_array = y.toarray()
#converting to numpy 2D array
test_array = np.array(y_array)
print type(y_array)
print len(y_array)
print y_array

################################################################

def svm_learning(x,y):
    clf = svm.SVC()
    clf.fit(x,y)
    print 'classifier trained'
    return clf #returning classifier

def test_classifier(classifier):
    for t in test_array:
        result = classifier.predict(t)
        print result


classifier = svm_learning(data_array, cat_array)
test_classifier(classifier)

它一直工作到最后，我得到如下错误:

It works till the end, where I get the error as below:

Traceback (most recent call last):
  File "C:\Users\5460\Desktop\Code\0506_01.py", line 130, in <module>
    test_classifier(classifier)
  File "C:\Users\5460\Desktop\Code\0506_01.py", line 125, in test_classifier
    result = classifier.predict(t)
  File "C:\Python27\lib\site-packages\sklearn\svm\base.py", line 466, in predict
    y = super(BaseSVC, self).predict(X)
  File "C:\Python27\lib\site-packages\sklearn\svm\base.py", line 282, in predict
    X = self._validate_for_predict(X)
  File "C:\Python27\lib\site-packages\sklearn\svm\base.py", line 404, in _validate_for_predict
    (n_features, self.shape_fit_[1]))
ValueError: X.shape[1] = 315 should be equal to 1094, the number of features at training time

我附上结果供参考，如下:

I have attached the result for referece, as below:

only subjects
499
499
[[ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.          0.          0.         ...,  0.          0.42325613  0.        ]
 [ 0.          0.          0.         ...,  0.          0.42325613  0.        ]
 ..., 
 [ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.          0.          0.         ...,  0.          0.          0.        ]]
<type 'numpy.ndarray'>
499
<type 'numpy.ndarray'>
499
[[ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.          0.          0.         ...,  0.          0.42325613  0.        ]
 [ 0.          0.          0.         ...,  0.          0.42325613  0.        ]
 ..., 
 [ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.          0.          0.         ...,  0.          0.          0.        ]]
only categories
499
[ 1.  1.  1.  0.  1.  0.  1.  0.  2.  2.  3.  3.  0.  3.  0.  0.  4.  0.
  0.  2.  3.  0.  0.  3.  0.  0.  3.  0.  0.  0.  1.  4.  1.  3.  0.  3.
  0.  3.  2.  3.  0.  0.  3.  2.  4.  0.  3.  2.  3.  2.  3.  3.  0.  0.
  0.  3.  0.  0.  0.  3.  0.  0.  2.  0.  0.  0.  0.  0.  2.  0.  0.  0.
  0.  0.  0.  4.  0.  0.  0.  0.  0.  2.  1.  1.  1.  1.  0.  1.  0.  0.
  0.  3.  0.  0.  0.  3.  3.  2.  0.  3.  0.  3.  3.  4.  1.  3.  3.  0.
  3.  0.  0.  0.  0.  3.  3.  1.  0.  0.  3.  2.  0.  1.  0.  1.  1.  1.
  1.  1.  2.  2.  2.  2.  2.  2.  0.  0.  0.  0.  0.  3.  3.  3.  3.  3.
  0.  3.  3.  0.  3.  0.  3.  3.  0.  0.  0.  3.  3.  1.  3.  3.  3.  0.
  0.  0.  3.  3.  3.  3.  0.  3.  3.  3.  3.  3.  3.  0.  0.  3.  3.  3.
  3.  0.  0.  3.  3.  0.  3.  3.  3.  2.  3.  3.  3.  3.  3.  0.  0.  3.
  3.  3.  3.  0.  3.  3.  3.  0.  3.  3.  4.  0.  3.  0.  0.  2.  3.  0.
  0.  0.  4.  4.  0.  0.  0.  0.  2.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  2.  0.  2.  2.
  4.  2.  2.  0.  0.  0.  2.  2.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  2.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  2.  0.  0.  0.  0.  0.  0.  0.  2.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  1.  0.  0.  0.  2.  2.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.
  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.]
<type 'numpy.ndarray'>
only testing subjects
75
75
<type 'numpy.ndarray'>
75
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
classifier trained

任何有关错误的帮助将不胜感激.我不确定缺少什么或出了什么问题.非常感谢提前！

Any help regarding the error will be really appreciated. I am not sure what is missing, or going wrong. Thanks a lot in advance!

测试 SVM 分类器进行文本分类时出错 [英] Error in testing SVM classifier for text classification

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

测试 SVM 分类器进行文本分类时出错 [英] Error in testing SVM classifier for text classification

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭