所有训练示例中都存在标签 not x [英] Label not x is present in all training examples

查看:47
本文介绍了所有训练示例中都存在标签 not x的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我在尝试预测标签/标签时遇到了一个问题在我的项目上.我目前正在使用类似的教程(用我自己的数据)根据给定的标签预测投诉登记册中的投诉例如 1 项投诉 --> 多种类型(保修、退款、航空调理)

Hello, I have come across an issue when trying to predict tag/label on my project. I am currently using similar tutorial (with my own data) to predict complain in complaint register based on given tag such as 1 Complaint --> many Genre (Warranty, Refund, Air Conditioning)

DF -> Tag No of Columns -> 4(原始),2(清理)>genre_new 和clean_plot 列名 -> ID、情节、标题、流派、流派_新,clean_plot

DF -> Tag No of Columns -> 4 (original), 2 (clean-up) > genre_new and clean_plot Column Names ->ID, Plot, Title, Genre, genre_new, clean_plot

我使用了本教程:https://www.analyticsvidhya.com/blog/2019/04/predicting-movie-genres-nlp-multi-label-classification/.这是为了预测具有多种类型的电影,例如 1 部电影有多种类型

I used this tutorial: https://www.analyticsvidhya.com/blog/2019/04/predicting-movie-genres-nlp-multi-label-classification/. This is to predict movies with multiple Genre such as 1 movies has multiple Genre

我也找到了解决方案UserWarning: Label not :NUMBER: 出现在所有训练中例子

问题:问题很可能是某些标签仅出现在少数文档中.当您将数据集拆分为训练和测试时验证您的模型,可能会发生某些标签从训练数据.

错误:标签警告和 0 预测

Error: label warning and 0 prediction

但我不知道如何编写这个解决方法来满足我的代码我不是编码员.请帮忙.

But I am not sure how to do write this workaround to cater my code as I am not a coder. Please help.

请参考我的谷歌驱动器链接https://drive.google.com/drive/folders/10yLOVWZPgl1shVww7cS9A?usp=sharing分享

Please refer to my google drive link https://drive.google.com/drive/folders/10yLOVWZPgl1shVwwM5qDy7iyMCm7cS9A?usp=sharing

推荐答案

from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
import pandas as pd
from sklearn.model_selection import train_test_split

mlb = MultiLabelBinarizer()
vect = CountVectorizer()
tfidf = TfidfTransformer()

lr = LogisticRegression()
clf = OneVsRestClassifier(lr)

df = pd.read_excel("Building Compliants in 2018 for training(1).xls")
df['Genre'] = df['Genre'].apply(lambda x: x.split(','))

y = mlb.fit_transform(df['Genre'])

train_data_vect = vect.fit_transform(df['Plot'])
train_data_tfidf = tfidf.fit_transform(train_data_vect)

x_train, x_test, y_train, y_test=train_test_split(train_data_tfidf,y, test_size=0.25)

clf.fit(x_train,y_train) #train your model on train data
print(clf.score(x_test,y_test)) #check score on test data
#op


Out[29]:
0.3333333333333333

#now for predicting , taking first element of Plot column

text =  df['Plot'][0]
vect_transform = vect.transform([text])
tfidf_transform = tfidf.transform(vect_transform)

clf.predict(tfidf_transform)
#array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]])

mlb.inverse_transform(clf.predict(tfidf_transform))
#op
[(' Warranty', 'Airconditioning')]

def infer_tags(q):
    q = clean_text(q)
    q = remove_stopwords(q)
    q_vec = tfidf.transform([q])
    q_pred = clf.predict(q_vec)
    #print(q)
    return MultiLabelBinarizer.inverse_transform(q_pred)


 for i in range(100): 
     k = x_test.sample(i).index[2] 
     #print("Trader: ", Tag['Title'][k])
     print("Trader: ", Tag['Title'][k], "\nPredicted genre: ",infer_tags(x_test[k]))
     print("Actual genre: ",Tag['Genre'][k], "\n")

#op
   Traceback (most recent call last):
       File "<ipython-input-70-28cc8e8a7204>", line 11, in <module>
       k = x_test.sample(i).index[2]
       File "C:\Users\LAUJ3\Documents\Python Project\env\lib\site- 
       packages\scipy\sparse\base.py", line 688, in __getattr__
       raise AttributeError(attr + " not found")

    AttributeError: sample not found

这篇关于所有训练示例中都存在标签 not x的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆