将数据分为培训和测试 [英] split data into training and testing

查看：61 发布时间：2021/5/31 18:44:44 python machine-learning training-data

本文介绍了将数据分为培训和测试的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想复制本教程以对两组进行分类.> https://machinelearningmastery.com/develop-n-gram-multichannel-convolutional-neural-network-sentiment-analysis/，但使用不同的数据集却无法做到，尽管很难尝试.我是编程新手，不胜感激可以提供帮助或提示的信息.

I want to replicate this tutorial to classify two groups https://machinelearningmastery.com/develop-n-gram-multichannel-convolutional-neural-network-sentiment-analysis/ with different dataset but could not do that despite being hardly trying. I am new to programming so would appreciate any assistance or tips that could help.

我的数据集很小(每组240个文件)，文件名为01-0240.

My dataset is small (240 files for each group), and files named 01 - 0240.

我认为这是围绕这些代码行的.

It is around these lines of codes, I think.

    if is_trian and filename.startswith('cv9'):
        continue
    if not is_trian and not filename.startswith('cv9'):
        continue

还有这些

            trainy = [0 for _ in range(900)] + [1 for _ in range(900)]
            save_dataset([trainX,trainy], 'train.pkl')

            testY = [0 for _ in range(100)] + [1 for _ in range(100)]
            save_dataset([testX,testY], 'test.pkl')

到目前为止遇到两个错误:

two errors were encountered so far:

输入数组应具有与目标数组相同数量的样本.找到483个输入样本和200个目标样本.

Input arrays should have the same number of samples as target arrays. Found 483 input samples and 200 target samples.

无法打开文件(无法打开文件:name ='model.h5'，errno =2，错误消息='没有这样的文件或目录'，标志= 0，o_flags =0)

Unable to open file (unable to open file: name = 'model.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

我非常感谢您的及时帮助.

I would really appreciate any prompt help.

谢谢.

//部分代码更加清晰.//

// Part of the code for more clarity. //

# load all docs in a directory
def process_docs(directory, is_trian):
    documents = list()
    # walk through all files in the folder
    for filename in listdir(directory):
        # skip any transcript in the test set

正如教程中所提到的，我想在下面添加一个参数来指示是处理培训文件还是测试文件.或者如果有另一个方式请分享

I want to add an argument below to indicate whether to process the training or testing files, just as mentioned in the tutorial. Or if there's another way please share it

        if is_trian and filename.startswith('----'):
            continue
        if not is_trian and not filename.startswith('----'):
            continue
        # create the full path of the file to open
        path = directory + '/' + filename
        # load the doc
        doc = load_doc(path)
        # clean doc
        tokens = clean_doc(doc)
        # add to list
        documents.append(tokens)
    return documents

# save a dataset to file
def save_dataset(dataset, filename):
    dump(dataset, open(filename, 'wb'))
    print('Saved: %s' % filename)

# load all training transcripts
healthy_docs = process_docs('PathToData/healthy', True)
sick_docs = process_docs('PathToData/sick', True)
trainX = healthy_docs + sick_docs
trainy = [0 for _ in range(len( healthy_docs ))] + [1 for _ in range(len( sick_docs ))]
save_dataset([trainX,trainy], 'train.pkl')

# load all test transcripts
healthy_docs = process_docs('PathToData/healthy', False)
sick_docs = process_docs('PathToData/sick', False)
testX = healthy_docs + sick_docs
testY = [0 for _ in range(len( healthy_docs ))] + [1 for _ in range(len( sick_docs ))]

save_dataset([testX,testY], 'test.pkl')

将数据分为培训和测试 [英] split data into training and testing

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

将数据分为培训和测试 [英] split data into training and testing

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭