TypeError:"KFold"对象不可迭代 [英] TypeError: 'KFold' object is not iterable

查看:444
本文介绍了TypeError:"KFold"对象不可迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在跟踪 Kaggle 上的一个内核,主要是,我正在跟踪

如果我从此行中删除shuffle=False,则出现以下错误:

TypeError:随机播放必须为True或False;有5

如果我删除5并保留shuffle=False,则出现以下错误;

TypeError:"KFold"对象不可迭代 这是此行的内容:for iteration, indices in enumerate(fold,start=1):

如果有人可以帮助我解决此问题并提出如何使用最新版本的scikit-learn做到这一点,将不胜感激.

谢谢.

解决方案

KFold是拆分器,因此您必须提供一些拆分方法.

示例代码:

X = np.array([1,1,1,1], [2,2,2,2], [3,3,3,3], [4,4,4,4]])
y = np.array([1, 2, 3, 4])
# Now you create your Kfolds by the way you just have to pass number of splits and if you want to shuffle.
fold = KFold(2,shuffle=False)
# For iterate over the folds just use split
for train_index, test_index in fold.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    # Follow fitting the classifier

如果要获取训练/测试循环的索引,只需添加枚举

for i, train_index, test_index in enumerate(fold.split(X)):
    print('Iteration:', i)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

我希望这行得通

I'm following one of the kernels on Kaggle, mainly, I'm following A kernel for Credit Card Fraud Detection.

I reached the step where I need to perform KFold in order to find the best parameters for Logistic Regression.

The following code is shown in the kernel itself, but for some reason (probably older version of scikit-learn, give me some errors).

def printing_Kfold_scores(x_train_data,y_train_data):
    fold = KFold(len(y_train_data),5,shuffle=False) 

    # Different C parameters
    c_param_range = [0.01,0.1,1,10,100]

    results_table = pd.DataFrame(index = range(len(c_param_range),2), columns = ['C_parameter','Mean recall score'])
    results_table['C_parameter'] = c_param_range

    # the k-fold will give 2 lists: train_indices = indices[0], test_indices = indices[1]
    j = 0
    for c_param in c_param_range:
        print('-------------------------------------------')
        print('C parameter: ', c_param)
        print('-------------------------------------------')
        print('')

        recall_accs = []
        for iteration, indices in enumerate(fold,start=1):

            # Call the logistic regression model with a certain C parameter
            lr = LogisticRegression(C = c_param, penalty = 'l1')

            # Use the training data to fit the model. In this case, we use the portion of the fold to train the model
            # with indices[0]. We then predict on the portion assigned as the 'test cross validation' with indices[1]
            lr.fit(x_train_data.iloc[indices[0],:],y_train_data.iloc[indices[0],:].values.ravel())

            # Predict values using the test indices in the training data
            y_pred_undersample = lr.predict(x_train_data.iloc[indices[1],:].values)

            # Calculate the recall score and append it to a list for recall scores representing the current c_parameter
            recall_acc = recall_score(y_train_data.iloc[indices[1],:].values,y_pred_undersample)
            recall_accs.append(recall_acc)
            print('Iteration ', iteration,': recall score = ', recall_acc)

            # The mean value of those recall scores is the metric we want to save and get hold of.
        results_table.ix[j,'Mean recall score'] = np.mean(recall_accs)
        j += 1
        print('')
        print('Mean recall score ', np.mean(recall_accs))
        print('')

    best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter']

    # Finally, we can check which C parameter is the best amongst the chosen.
    print('*********************************************************************************')
    print('Best model to choose from cross validation is with C parameter = ', best_c)
    print('*********************************************************************************')

    return best_c

The errors I'm getting are as follows: for this line: fold = KFold(len(y_train_data),5,shuffle=False) Error:

TypeError: init() got multiple values for argument 'shuffle'

if I remove the shuffle=False from this line, I'm getting the following error:

TypeError: shuffle must be True or False; got 5

If I remove the 5 and keep the shuffle=False, I'm getting the following error;

TypeError: 'KFold' object is not iterable which is from this line: for iteration, indices in enumerate(fold,start=1):

If someone can help me with solving this issue and suggest how this can be done with the latest version of scikit-learn it will be very appreciated.

Thanks.

解决方案

KFold is a splitter, so you have to give something to split.

example code:

X = np.array([1,1,1,1], [2,2,2,2], [3,3,3,3], [4,4,4,4]])
y = np.array([1, 2, 3, 4])
# Now you create your Kfolds by the way you just have to pass number of splits and if you want to shuffle.
fold = KFold(2,shuffle=False)
# For iterate over the folds just use split
for train_index, test_index in fold.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    # Follow fitting the classifier

If you want to get the index for the loop of train/test, just add enumerate

for i, train_index, test_index in enumerate(fold.split(X)):
    print('Iteration:', i)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

I hope this works

这篇关于TypeError:"KFold"对象不可迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆