scikit-learn 第119页 - IT屋-程序员软件开发技术分享社区

sklearn(错误的输入形状)ValueError

我是ML和sklearn领域的新手.我尝试在具有X_train[2500,800]，Y_train[2500,8]的数据集上使用GaussianNB. from sklearn.naive_bayes import GaussianNB clf = GaussianNB() clf.fit(X, Y) 在运行程序时，它显示 ValueError:输入形状错误(2500，8). ..

发布时间：2020-05-04 09:19:17 python machine-learning scikit-learn naivebayes AI人工智能

带有Scikit-Learn的Google云ML引发:"dict"对象没有属性"lower"

我使用以下教程在Google Cloud中使用我的Scikit学习情感分析模型: https://cloud.google.com/ml-engine/docs/scikit/快速入门我的模型定义如下: import csv import os from collections import defaultdict import sys import re import numpy ..

发布时间：2020-05-04 09:19:14 python machine-learning scikit-learn gcloud AI人工智能

为什么我在python的sklearn中使用pipline和不使用pipline获得不同的值

我将recursive feature elimination with cross-validation (rfecv)与GridSearchCV和RandomForest分类器结合使用，如下所示使用管道和不使用管道. 我的带有管道的代码如下. X = df[my_features_all] y = df['gold_standard'] #get development and ..

发布时间：2020-05-04 09:19:04 python machine-learning scikit-learn pipeline cross-validation AI人工智能

ValueError:找到的数组的样本数不一致[6 1786]

这是我的代码: from sklearn.svm import SVC from sklearn.grid_search import GridSearchCV from sklearn.cross_validation import KFold from sklearn.feature_extraction.text import TfidfVectorizer from sklearn ..

发布时间：2020-05-04 09:18:50 python machine-learning scikit-learn text-analysis AI人工智能

我正在使用python，并且我想在scikit learning中使用嵌套的交叉验证.我发现了一个很好的示例: NUM_TRIALS = 30 non_nested_scores = np.zeros(NUM_TRIALS) nested_scores = np.zeros(NUM_TRIALS) # Choose cross-validation techniques for the in ..

发布时间：2020-05-04 09:18:39 python machine-learning scikit-learn cross-validation grid-search AI人工智能

Scikit学习:在GridSearchCV中评分

似乎scikit-learn的GridSearchCV收集了其(内部)交叉验证折叠的得分，然后平均所有折叠的得分.我想知道背后的理由.乍一看，似乎更灵活地是收集交叉验证折叠的预测，然后将选定的评分标准应用于所有折叠的预测. 我偶然发现的原因是我在cv=LeaveOneOut()和scoring='balanced_accuracy'(scikit-learn v0.20.dev0)的不平衡数 ..

发布时间：2020-05-04 09:18:33 machine-learning cross-validation optimization scikit-learn AI人工智能

如何在Sklearn中使用SVC运行RFECV

我正在尝试使用SVC作为分类器，使用GridSearchCV通过交叉验证(RFECV)进行递归特征消除. 我的代码如下. X = df[my_features] y = df['gold_standard'] x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=0) k_fold = Stra ..

发布时间：2020-05-04 09:18:19 python machine-learning scikit-learn svm AI人工智能

如何在SelectFromModel()中确定用于选择特征的阈值?

我正在使用随机森林分类器进行特征选择.我总共有70个功能，并且我要从70个功能中选择最重要的功能.下面的代码显示了分类器，从最重要到最不重要显示了这些功能. 代码: feat_labels = data.columns[1:] clf = RandomForestClassifier(n_estimators=100, random_state=0) # Train the clas ..

发布时间：2020-05-04 09:17:53 python pandas numpy machine-learning scikit-learn AI人工智能

使用sklearn管线比较多种算法

我正试图建立一个scikit-learn管道来简化我的工作.我面临的问题是我不知道哪种算法(随机森林，朴素贝叶斯，决策树等)最适合，因此我需要尝试每种算法并比较结果.但是，流水线一次只采用一种算法吗?例如，下面的管道仅采用SGDClassifier()作为算法. pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', Tfid ..

发布时间：2020-05-04 09:17:45 python algorithm machine-learning scikit-learn AI人工智能

“字典中的线性依赖性" sklearns OMP中的异常

我正在使用sklearns OrthogonalMatchingPursuit 来获取使用 KSVD算法学到的词典对信号进行稀疏编码.但是，在试穿过程中，我得到以下RuntimeWarning: /usr/local/lib/python2.7/dist-packages/sklearn/linear_model/omp.py:391: RuntimeWarning: Orthogonal ..

发布时间：2020-05-04 09:17:41 python machine-learning scikit-learn compression AI人工智能

将scipy.sparse.csr.csr_matrix转换为列表列表

我正在学习多标签分类，并尝试从scikit学习中实施tfidf教程. 我正在处理文本语料库以计算其tf-idf分数. 我正在为此目的使用模块sklearn.feature_extraction.text.使用CountVectorizer和TfidfTransformer，现在我为每个词汇集了语料库矢量和tfidf. 问题是我现在有一个稀疏矩阵，例如: (0, 47) 0.104275891 ..

发布时间：2020-05-04 09:17:39 python machine-learning scipy scikit-learn tf-idf AI人工智能

为什么xgboost.cv和sklearn.cross_val_score给出不同的结果?

我正在尝试对数据集进行分类.我首先使用了XGBoost: import xgboost as xgb import pandas as pd import numpy as np train = pd.read_csv("train_users_processed_onehot.csv") labels = train["Buy"].map({"Y":1, "N":0}) feature ..

发布时间：2020-05-04 09:17:29 python machine-learning scikit-learn cross-validation xgboost AI人工智能

如果我们在管道中包含转换器，来自scikit-learn的cross_val_score和gridsearchCV的k折交叉验证分数是否有偏差?

应该使用诸如StandardScaler之类的数据预处理器来fit_transform训练集，并且仅转换(不适合)测试集.我希望相同的拟合/转换过程适用于交叉验证以调整模型.但是，我发现cross_val_score和GridSearchCV用预处理器拟合了整个火车集合(而不是fit_transform inner_train集合，并变换了inner_validation集合).我相信这可以人为地 ..

发布时间：2020-05-04 09:17:26 machine-learning scikit-learn pipeline cross-validation grid-search AI人工智能

Python sklearn在训练期间显示损失值

我想在训练期间检查损失值，以便可以观察每次迭代的损失.到目前为止，我还没有找到一种简单的方法来让scikit学习给我损失值的历史记录，也没有找到scikit中已经存在的功能来为我绘制损失值. 如果没有办法绘制此图，那么如果我可以简单地在classifier.fit的末尾获取最终损失值，那就太好了. 注意:我知道一些解决方案是封闭形式的事实.我正在使用几个没有解析解决方案的分类器，例如逻 ..

发布时间：2020-05-04 09:17:14 python machine-learning scikit-learn AI人工智能

AUC-ROC用于无排名的分类器，例如OSVM

我目前正在使用auc-roc曲线，可以说我有一个无等级分类器，例如一类SVM，其中预测为0和1，并且如果我将预测不轻松转换为概率或分数，不想绘制AUC-ROC，我只想计算AUC以使用它来查看我的模型做得如何，我还能这样做吗?它是否仍将被称为AUC还是作为AUC尤其是存在两个可以使用的阈值(0，1)?如果可以的话，它与按排名分数计算AUC一样好现在让我们说我决定使用SVM(0,1)创建的标签 ..

发布时间：2020-05-04 09:16:50 python machine-learning scikit-learn svm auc AI人工智能

如何使用sklearn的IncrementalPCA partial_fit

我有一个相当大的数据集，我想分解它，但是太大了，无法加载到内存中.研究我的选择之后，看来 sklearn的IncrementalPCA 是一个不错的选择，但是我还不太清楚如何使它工作. 我可以很好地加载数据: f = h5py.File('my_big_data.h5') features = f['data'] 从此示例中，看来我需要确定我要从中读取多少大小的块: num_ro ..

发布时间：2020-05-04 09:16:46 python machine-learning scikit-learn pca AI人工智能

CountVectorizer删除仅出现一次的功能

我正在使用sklearn python软件包，并且在使用预先创建的字典创建CountVectorizer时遇到了麻烦，其中CountVectorizer不会删除仅出现一次或根本不出现的功能这是我的示例代码: train_count_vect, training_matrix, train_labels = setup_data(train_corpus, query, vocabul ..

发布时间：2020-05-04 09:16:25 python machine-learning scikit-learn text-classification AI人工智能

scikit学习伪变量的创建

在scikit-learn中，我需要哪些模型将分类变量分解为虚拟二进制字段? 例如，如果列为political-party，并且值为democrat，republican和green，则对于许多算法，您必须将其分为三列，其中每一行只能容纳一个1，其余所有必须为0. 这避免了强制离散化[democrat, republican and green] => [0, 1, 2]时不存在的序数 ..

发布时间：2020-05-04 09:16:21 python machine-learning scikit-learn AI人工智能

在决策树中查找到决策边界的距离

我想在到目前为止，对于基于import numpy as np import matplotlib.pyplot as plt from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import make_moons # Generate some example data X, y = make_moons(n ..

发布时间：2020-05-04 09:16:07 python machine-learning scikit-learn classification decision-tree AI人工智能

具有自定义距离指标的"KD树"

我想使用"KDtree"(这是最好的选择.其他"KNN"算法对我的项目而言不是最佳选择)与自定义距离指标一起使用.我在这里检查了一些类似问题的答案，这应该可以...但是没有. distance_matrix是对称的，根据定义应如此: array([[ 1., 0., 5., 5., 0., 3., 2.], [ 0., 1., 0., 0., 0., 0., ..

发布时间：2020-05-04 09:15:54 python-3.x machine-learning scikit-learn AI人工智能

scikit-learn相关内容