将数据拆分为测试和训练,包括 [英] Splitting data into test and train including

查看:38
本文介绍了将数据拆分为测试和训练,包括的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何根据数据帧变量将包含多个视频文件的文件夹拆分为 train 和 test 文件夹,这些变量告诉我哪个视频应该在 train 文件夹中,哪个视频应该在 test 文件夹中?(在 Python 3.0 中).其中多个视频位于不同的类别文件夹中

How do I split my folder containing multiple video files into train and test folders based on dataframe variables that tell me the which video should be in the train folder and which video should be in the test folder? (in Python 3.0). In which multiple videos are located in separate category folders

每个视频都可以在例如以下类别目录中找到:

Each of the videos can be found in for instance the following category directories:

C:\Users\Me\Videos\a
C:\Users\Me\Videos\b

这意味着对于每个类别,我都需要一个train"和test"文件夹,例如:

Which means that for every category I need a "train" and "test" folder like:

C:\Users\Me\Videos\a\train
C:\Users\Me\Videos\a\test

虽然我有一个(编辑)csv 文件,其中包含以下信息.因此,我不希望我的 train 和 split 是随机的,而是基于我工作表中的二进制代码.

While I have an (EDIT) csv-file containing the following information. Thus, I dont want my train and split to be random, but based on the binary code in my sheet.

videoname |test|train|category|
-------------------------------
video1.mp4| 1  |0    |a       |
video2.mp4| 1  |0    |b       |
video3.mp4| 1  |0    |c       |
video4.mp4| 0  |1    |c       |

谁能指出我如何使用该文件为我执行此操作的方向?我可以以某种方式将文件放在一个数据框中,告诉 Python 将文件移动到哪里吗?

Can anyone point me in the direction of how I can use the file to do this for me? Can I somehow put the file in a dataframe which tells Python where to move the files?

import os
import csv
from collections import defaultdict

videoroot = r'H:\Desktop'
transferrable_data = defaultdict(list)
with open(r'H:\Desktop\SVW.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        video_path_source = os.path.join(videoroot, row['Genre'], row['FileName'])
        if (row['Train 1?'] == 0):
            split_type = 'test'
        else:
            split_type = 'train'
        video_destination_path = os.path.join(videoroot, row['Genre'], split_type, row['FileName'])
        transferrable_data[video_path_source].append(video_destination_path)

推荐答案

首先要做的是读取您的 excel 并构建从源文件到目标文件夹的映射:

Well the first thing to do is to read your excel and construct a mapping from source file to destination folders :

VIDEO_ROOT_FOLDER = 'C:\Users\Me\Videos'
transferrable_data = defaultdict(list)
for row in excel_iteratable:
    video_source_path = os.path.join(VIDEO_ROOT_FOLDER, row['category'], row['videoname'])
    if (row['test'] == 1):
        split_type = 'test'
    else:  # I suppose you can only dispatch to test or train in a row
        split_type = 'train'
    video_destination_path = os.path.join(VIDEO_ROOT_FOLDER, row['category'], split_type, row['videoname'])) 
    transferrable_data[video_path_source].append(video_destination_path)

然后您可以编写一个脚本,使用以下两种方法之一将文件移动到正确的路径:

then you can write a script where you move your files to the correct paths, using one of the two following methods :

import os
os.rename("path/to/current/video", "path/to/destination/folder")

或者如果您需要复制(您不想更改视频文件夹):

or if you need to copy (you don't want to alter your video folder) :

from shutil import copyfile
copyfile("path/to/current/video", "path/to/destination/folder")

假设您的映射是:

transferrable_data = {'C:\Users\Me\Videos\a\video1.mp4' : ['C:\Users\Me\Videos\a\train\video1.mp4'], 'C:\Users\Me\Videos\a\video2.mp4': ['C:\Users\Me\Videos\b\test\video2.mp4', 'C:\Users\Me\Videos\c\test\video2.mp4']}

您可以执行以下操作:

from shutil import copyfile
transferrable_data = {'C:\Users\Me\Videos\a\video1.mp4' : ['C:\Users\Me\Videos\a\train\video1.mp4'], 'C:\Users\Me\Videos\a\video2.mp4': ['C:\Users\Me\Videos\b\test\video2.mp4', 'C:\Users\Me\Videos\c\test\video2.mp4']}
for src, destination_list in transferrable_data.items():
    for dest in destination_list:
        copyfile(src, dest)

这篇关于将数据拆分为测试和训练,包括的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆