TypeError:"KFold"对象不可迭代 [英] TypeError: 'KFold' object is not iterable
问题描述
我正在跟踪 Kaggle 上的一个内核,主要是,我正在跟踪
如果我从此行中删除 TypeError:随机播放必须为True或False;有5 如果我删除 TypeError:"KFold"对象不可迭代
这是此行的内容: 如果有人可以帮助我解决此问题并提出如何使用最新版本的scikit-learn做到这一点,将不胜感激. 谢谢. KFold是拆分器,因此您必须提供一些拆分方法. 示例代码: 如果要获取训练/测试循环的索引,只需添加枚举 我希望这行得通 I'm following one of the kernels on Kaggle, mainly, I'm following A kernel for Credit Card Fraud Detection. I reached the step where I need to perform KFold in order to find the best parameters for Logistic Regression. The following code is shown in the kernel itself, but for some reason (probably older version of scikit-learn, give me some errors). The errors I'm getting are as follows:
for this line: TypeError: init() got multiple values for argument 'shuffle' if I remove the TypeError: shuffle must be True or False; got 5 If I remove the TypeError: 'KFold' object is not iterable
which is from this line: If someone can help me with solving this issue and suggest how this can be done with the latest version of scikit-learn it will be very appreciated. Thanks. KFold is a splitter, so you have to give something to split. example code: If you want to get the index for the loop of train/test, just add enumerate I hope this works 这篇关于TypeError:"KFold"对象不可迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!shuffle=False
,则出现以下错误:
5
并保留shuffle=False
,则出现以下错误;
for iteration, indices in enumerate(fold,start=1):
X = np.array([1,1,1,1], [2,2,2,2], [3,3,3,3], [4,4,4,4]])
y = np.array([1, 2, 3, 4])
# Now you create your Kfolds by the way you just have to pass number of splits and if you want to shuffle.
fold = KFold(2,shuffle=False)
# For iterate over the folds just use split
for train_index, test_index in fold.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# Follow fitting the classifier
for i, train_index, test_index in enumerate(fold.split(X)):
print('Iteration:', i)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
def printing_Kfold_scores(x_train_data,y_train_data):
fold = KFold(len(y_train_data),5,shuffle=False)
# Different C parameters
c_param_range = [0.01,0.1,1,10,100]
results_table = pd.DataFrame(index = range(len(c_param_range),2), columns = ['C_parameter','Mean recall score'])
results_table['C_parameter'] = c_param_range
# the k-fold will give 2 lists: train_indices = indices[0], test_indices = indices[1]
j = 0
for c_param in c_param_range:
print('-------------------------------------------')
print('C parameter: ', c_param)
print('-------------------------------------------')
print('')
recall_accs = []
for iteration, indices in enumerate(fold,start=1):
# Call the logistic regression model with a certain C parameter
lr = LogisticRegression(C = c_param, penalty = 'l1')
# Use the training data to fit the model. In this case, we use the portion of the fold to train the model
# with indices[0]. We then predict on the portion assigned as the 'test cross validation' with indices[1]
lr.fit(x_train_data.iloc[indices[0],:],y_train_data.iloc[indices[0],:].values.ravel())
# Predict values using the test indices in the training data
y_pred_undersample = lr.predict(x_train_data.iloc[indices[1],:].values)
# Calculate the recall score and append it to a list for recall scores representing the current c_parameter
recall_acc = recall_score(y_train_data.iloc[indices[1],:].values,y_pred_undersample)
recall_accs.append(recall_acc)
print('Iteration ', iteration,': recall score = ', recall_acc)
# The mean value of those recall scores is the metric we want to save and get hold of.
results_table.ix[j,'Mean recall score'] = np.mean(recall_accs)
j += 1
print('')
print('Mean recall score ', np.mean(recall_accs))
print('')
best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter']
# Finally, we can check which C parameter is the best amongst the chosen.
print('*********************************************************************************')
print('Best model to choose from cross validation is with C parameter = ', best_c)
print('*********************************************************************************')
return best_c
fold = KFold(len(y_train_data),5,shuffle=False)
Error:
shuffle=False
from this line, I'm getting the following error:
5
and keep the shuffle=False
, I'm getting the following error;
for iteration, indices in enumerate(fold,start=1):
X = np.array([1,1,1,1], [2,2,2,2], [3,3,3,3], [4,4,4,4]])
y = np.array([1, 2, 3, 4])
# Now you create your Kfolds by the way you just have to pass number of splits and if you want to shuffle.
fold = KFold(2,shuffle=False)
# For iterate over the folds just use split
for train_index, test_index in fold.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# Follow fitting the classifier
for i, train_index, test_index in enumerate(fold.split(X)):
print('Iteration:', i)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]