sci-kit learn:使用 X.reshape(-1, 1) 重塑数据 [英] sci-kit learn: Reshape your data either using X.reshape(-1, 1)

查看：61 发布时间：2021/7/16 20:00:40 python scikit-learn

本文介绍了sci-kit learn:使用 X.reshape(-1, 1) 重塑数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在训练一个用于文本分类的 python (2.7.11) 分类器，在运行时我收到一条已弃用的警告消息，我不知道代码中的哪一行导致了它！错误/警告.但是，代码工作正常并给我结果......

I'm training a python (2.7.11) classifier for text classification and while running I'm getting a deprecated warning message that I don't know which line in my code is causing it! The error/warning. However, the code works fine and give me the results...

\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.py:386:DeprecationWarning:将一维数组作为数据在 0.17 中被弃用，并会在 0.19 中引发 ValueError.如果您的数据具有单个特征，则使用 X.reshape(-1, 1) 或 X.reshape(1, -1) 如果它包含单个样本来重塑您的数据.

\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

我的代码:

def main():
    data = []
    folds = 10
    ex = [ [] for x in range(0,10)]
    results = []
    for i,f in enumerate(sys.argv[1:]):
        data.append(csv.DictReader(open(f,'r'),delimiter='\t'))
    for f in data:       
        for i,datum in enumerate(f):
            ex[i % folds].append(datum)
    #print ex
    for held_out in range(0,folds):
        l = []
        cor = []
        l_test = []
        cor_test = []
        vec = []
        vec_test = []

        for i,fold in enumerate(ex):
            for line in fold:
                if i == held_out:
                    l_test.append(line['label'].rstrip("\n"))
                    cor_test.append(line['text'].rstrip("\n"))
                else:
                    l.append(line['label'].rstrip("\n"))
                    cor.append(line['text'].rstrip("\n"))

        vectorizer = CountVectorizer(ngram_range=(1,1),min_df=1)
        X = vectorizer.fit_transform(cor)
        for c in cor:        
            tmp = vectorizer.transform([c]).toarray()
            vec.append(tmp[0])
        for c in cor_test:        
            tmp = vectorizer.transform([c]).toarray()
            vec_test.append(tmp[0])

        clf = MultinomialNB()
        clf .fit(vec,l)
        result = accuracy(l_test,vec_test,clf)
        print result

if __name__ == "__main__":
    main()

知道哪一行引发了这个警告吗?另一个问题是，用不同的数据集运行这段代码给了我同样的准确度，我不知道是什么原因造成的?如果我想在另一个python进程中使用这个模型，我查看了文档，我找到了一个使用pickle库的例子，但不是joblib.所以，我尝试遵循相同的代码，但这给了我错误:

Any idea which line raises this warning? Another issue is that running this code with different data sets gives me the same exact accuracy, and I can't figure out what causes this? If I want to use this model in another python process, I looked at the documentation and I found an example of using pickle library, but not for joblib. So, I tried following the same code, but this gave me errors:

clf = joblib.load('model.pkl') 
pred = clf.predict(vec);

另外，如果我的数据是这种格式的 CSV 文件:label \t text \n"测试数据的标签列应该是什么?

Also, if my data is CSV file with this format: "label \t text \n" what should be in the label column in test data?

提前致谢

sci-kit learn:使用 X.reshape(-1, 1) 重塑数据 [英] sci-kit learn: Reshape your data either using X.reshape(-1, 1)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

sci-kit learn:使用 X.reshape(-1, 1) 重塑数据 [英] sci-kit learn: Reshape your data either using X.reshape(-1, 1)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭