使用Pandas将CSV递归编辑到子目录 [英] Recursively Edit CSV to Subdirectories using Pandas
问题描述
我有一系列的子目录文件夹,每个子文件夹都有一个"_Invoice.csv".
I have a series of subdirectory folders that each have a "_Invoice.csv".
/Invoice List/
Invoice1folder/
..._Invoice.csv
Invoice2folder/
..._Invoice.csv
Invoice5folder/
..._Invoice.csv
Invoice9folder/
_Invoice.csv
对于每个"_Invoice.csv",我都有列[A,B,C,D].我试图递归搜索所有子目录文件夹,打开每个"_Invoice.csv"文件,将列减少为仅[A,C],然后将其另存为"_Invoice_Reduced.csv".
With each "_Invoice.csv", I have columns [A,B,C,D]. I am trying to recursively search through all subdirectory folders, open each "_Invoice.csv" file and reduce the columns to only [A,C] and then save it as "_Invoice_Reduced.csv".
"_Invoice.csv" "_Invoice_Reduced.csv"
A B C D => A C
1 2 3 4 => 1 3
我当前的尝试是:
import pandas as pd
import os
columns_to_keep = ['A','C']
final_form= pd.DataFrame()
for file in os.listdir():
if file.endswith('*_Invoice.csv'):
df = pd.read_csv(file)
df = df.loc[;columns_to_keep]
df = df.to_csv(f'{file.name}_Invoice_Reduced.csv')
if file.endswith('*_Invoice_Reduced.csv'):
df = pd.read_csv(file)
final_form= final_form.append(df, ignore_index=True)
TLDR:我正在尝试创建一个脚本,该脚本可进入每个子目录,减少先前存在的CSV,减少CSV的列并保存子集.然后,在通读所有子目录之后,将精简的文件合并为一个big_frame.
TLDR: I am attempting to create a script that goes into every subdirectory, reduces a pre-existing CSV, reduce the columns of CSV down and save the subset. Then after it has read through all subdirectories, combine the reduced files into a single big_frame.
有什么想法吗?
推荐答案
这将完成工作.
代替打开,删除列,保存并继续操作;我选择仅使用减少的列打开,保存此减少的DataFrame,然后追加到df
.这将导致所有减少的文件都堆叠在此DataFrame中.
Instead of opening, removing columns, saving and moving on; I have opted for opening only with the reduced columns, saving this reduced DataFrame, then appending to df
. This will result in all the reduced files being stacked in this one DataFrame.
使用path = "."
来自当前目录
from pathlib import Path
import pandas as pd
df = pd.DataFrame()
columns_to_keep = ['A','C']
path = "."
pattern = "*_Invoice.csv"
for file in Path(path).rglob(pattern):
output_file = "{}/{}{}".format(file.parent, file.stem, "_Reduced.csv")
_df = pd.read_csv(file, usecols=columns_to_keep)
_df.to_csv(output_file, sep=",", index=False, header=True)
df = pd.concat([df, _df])
这篇关于使用Pandas将CSV递归编辑到子目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!