使用Pandas将CSV递归编辑到子目录 [英] Recursively Edit CSV to Subdirectories using Pandas

查看:148
本文介绍了使用Pandas将CSV递归编辑到子目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列的子目录文件夹,每个子文件夹都有一个"_Invoice.csv".

I have a series of subdirectory folders that each have a "_Invoice.csv".

/Invoice List/
              Invoice1folder/
                             ..._Invoice.csv
              Invoice2folder/ 
                             ..._Invoice.csv
              Invoice5folder/
                             ..._Invoice.csv
              Invoice9folder/
                             _Invoice.csv

对于每个"_Invoice.csv",我都有列[A,B,C,D].我试图递归搜索所有子目录文件夹,打开每个"_Invoice.csv"文件,将列减少为仅[A,C],然后将其另存为"_Invoice_Reduced.csv".

With each "_Invoice.csv", I have columns [A,B,C,D]. I am trying to recursively search through all subdirectory folders, open each "_Invoice.csv" file and reduce the columns to only [A,C] and then save it as "_Invoice_Reduced.csv".

"_Invoice.csv"       "_Invoice_Reduced.csv"
 A B C D        =>              A C
 1 2 3 4        =>              1 3 

我当前的尝试是:

import pandas as pd
import os

columns_to_keep = ['A','C']
final_form= pd.DataFrame()

for file in os.listdir():
    if file.endswith('*_Invoice.csv'):
        df = pd.read_csv(file)
        df = df.loc[;columns_to_keep]
        df = df.to_csv(f'{file.name}_Invoice_Reduced.csv')
   if file.endswith('*_Invoice_Reduced.csv'):
        df = pd.read_csv(file)
        final_form= final_form.append(df, ignore_index=True)

TLDR:我正在尝试创建一个脚本,该脚本可进入每个子目录,减少先前存在的CSV,减少CSV的列并保存子集.然后,在通读所有子目录之后,将精简的文件合并为一个big_frame.

TLDR: I am attempting to create a script that goes into every subdirectory, reduces a pre-existing CSV, reduce the columns of CSV down and save the subset. Then after it has read through all subdirectories, combine the reduced files into a single big_frame.

有什么想法吗?

推荐答案

这将完成工作.

代替打开,删除列,保存并继续操作;我选择仅使用减少的列打开,保存此减少的DataFrame,然后追加到df.这将导致所有减少的文件都堆叠在此DataFrame中.

Instead of opening, removing columns, saving and moving on; I have opted for opening only with the reduced columns, saving this reduced DataFrame, then appending to df. This will result in all the reduced files being stacked in this one DataFrame.

使用path = "."来自当前目录

from pathlib import Path
import pandas as pd


df = pd.DataFrame()
columns_to_keep = ['A','C']
path = "."
pattern = "*_Invoice.csv"

for file in Path(path).rglob(pattern):
    output_file = "{}/{}{}".format(file.parent, file.stem, "_Reduced.csv")
    _df = pd.read_csv(file, usecols=columns_to_keep)
    _df.to_csv(output_file, sep=",", index=False, header=True)
    df = pd.concat([df, _df])

这篇关于使用Pandas将CSV递归编辑到子目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆