无法在Logistic回归中使用Decision_function()评估分数 [英] Unable to evaluate score using decision_function() in Logistic Regression

查看:186
本文介绍了无法在Logistic回归中使用Decision_function()评估分数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在做这个大学.在华盛顿的作业中,我必须使用LogisticRegression中的Decision_function()来预测sample_test_matrix的分数(最后几行).但是我得到的错误是

I'm doing this Univ. Of Washington assignment where i have to predict the score of sample_test_matrix (last few lines) using decision_function() in LogisticRegression . But the error that i'm getting is

    ValueError: X has 145 features per sample; expecting 113092

这是代码:

   import pandas as pd 
   import numpy as np 
   from sklearn.linear_model import LogisticRegression

   products = pd.read_csv('amazon_baby.csv')

   def remove_punct (text) :
       import string 
       text = str(text)
       for i in string.punctuation:
          text = text.replace(i,"")
       return(text)

   products['review_clean'] = products['review'].apply(remove_punct)
   products = products[products.rating != 3]
   products['sentiment'] = products['rating'].apply(lambda x : +1 if x > 3 else  -1 )

   train_data_index = pd.read_json('module-2-assignment-train-idx.json')
   test_data_index = pd.read_json('module-2-assignment-test-idx.json')

   train_data = products.loc[train_data_index[0], :]
   test_data = products.loc[test_data_index[0], :]
   train_data = train_data.dropna()
   test_data = test_data.dropna()

   from sklearn.feature_extraction.text import CountVectorizer

   train_matrix = vectorizer.fit_transform(train_data['review_clean'])
   test_matrix = vectorizer.fit_transform(test_data['review_clean'])

   sentiment_model = LogisticRegression()
   sentiment_model.fit(train_matrix, train_data['sentiment'])
   print (sentiment_model.coef_)

   sample_data = test_data[10:13]
   print (sample_data)

   sample_test_matrix = vectorizer.transform(sample_data['review_clean'])
   scores = sentiment_model.decision_function(sample_test_matrix)
   print (scores)

以下是产品数据:

          Name                                                         Review                                       Rating  

  0       Planetwise Flannel Wipes                              These flannel wipes are OK, but in my opinion ...       3  


  1       Planetwise Wipe Pouch                                 it came early and was not disappointed. i love...       5  


  2       Annas Dream Full Quilt with 2 Shams                   Very soft and comfortable and warmer than it l...       5  

  3       Stop Pacifier Sucking without tears with Thumb...     This is a product well worth the purchase.  I ...       5

  4       Stop Pacifier Sucking without tears with Thumb...      All of my kids have cried non-stop when I trie...       5 

推荐答案

此行在后续各行中引起错误:

This line is causing errors in the subsequent lines:

test_matrix = vectorizer.fit_transform(test_data['review_clean'])

将以上内容更改为此:

test_matrix = vectorizer.transform(test_data['review_clean'])

说明::使用fit_transform()将在测试数据上重新设置CountVectorizer.因此,所有有关训练数据的信息都将丢失,并且仅根据测试数据计算词汇量.

Explanation: Using fit_transform() will refit the CountVectorizer on the test data. So all the information about the training data will be lost and vocabulary will be calculated only from test data.

然后使用该vectorizer对象转换sample_data['review_clean'].因此,其中的功能仅是从test_data学到的功能.

Then you are using that vectorizer object to transform the sample_data['review_clean']. So the features in that will be only those which are learnt from test_data.

但是sentiment_model受过train_data词汇训练.因此功能不同.

But the sentiment_model is trained on vocabulary from train_data. Hence the features are different.

始终在测试数据上使用transform(),从不使用fit_transform().

Always use transform() on test data, never fit_transform().

这篇关于无法在Logistic回归中使用Decision_function()评估分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆