如何避免将大文件重复加载到python脚本中? [英] How to avoid loading a large file into a python script repeatedly?

查看：86 发布时间：2021/6/11 19:35:00 python object large-file-upload

本文介绍了如何避免将大文件重复加载到python脚本中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我编写了一个 Python 脚本来获取一个大文件(矩阵 ~50k 行 X ~500 列)并将其用作数据集来训练随机森林模型.

I've written a python script to take a large file (a matrix ~50k rows X ~500 cols) and use it as a dataset to train a random forest model.

我的脚本有两个功能，一个用于加载数据集，另一个用于使用所述数据训练随机森林模型.这些都可以正常工作，但是文件上传需要大约 45 秒，每次我想训练一个略有不同的模型(在同一数据集上测试多个模型)时，都很难做到这一点.这是文件上传代码:

My script has two functions, one to load the dataset and the other to train the random forest model using said data. These both work fine, but the file upload takes ~45 seconds and it's a pain to do this every time I want to train a subtly different model (testing many models on the same dataset). Here is the file upload code:

def load_train_data(train_file):
    # Read in training file
    train_f = io.open(train_file)
    train_id_list = []
    train_val_list = []
    for line in train_f:
        list_line = line.strip().split("\t")
        if list_line[0] != "Domain":
            train_identifier = list_line[9]
            train_values = list_line[12:]
            train_id_list.append(train_identifier)
            train_val_float = [float(x) for x in train_values]
            train_val_list.append(train_val_float)
    train_f.close()
    train_val_array = np.asarray(train_val_list)

    return(train_id_list,train_val_array)

这将返回一个带有 col 的 numpy 数组.9 作为标签和列.12-end 作为元数据训练随机森林.

This returns a numpy array with col. 9 as the label and cols. 12-end as the metadata to train the random forest.

我将用相同的数据训练许多不同形式的模型，所以我只想上传一次文件并让它可用于输入我的随机森林函数.我希望文件成为我认为的对象(我对 python 相当陌生).

I am going to train many different forms of my model with the same data, so I just want to upload the file one time and have it available to feed into my random forest function. I want the file to be an object I think (I am fairly new to python).

如何避免将大文件重复加载到python脚本中? [英] How to avoid loading a large file into a python script repeatedly?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何避免将大文件重复加载到python脚本中? [英] How to avoid loading a large file into a python script repeatedly?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭