循环浏览多个CSV文件并运行脚本 [英] Loop through multiple CSV files and run a script

查看:76
本文介绍了循环浏览多个CSV文件并运行脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本,可以从csv文件中提取数据,对其进行一些处理,然后创建一个输出excel文件.但是,这是一个繁琐的过程,因为我需要对多个文件进行处理.

I have a script which pulls in data from a csv file, does some manipulations to it and creates an output excel file. But, its a tedious process as I need to do it for multiple files.

问题:我是否可以同时在多个csv文件中运行此脚本,并为每个输入文件创建单独的excel文件输出?

Question: Is there a way for me to run this script across multiple csv files together and create a separate excel file output for each input file?

我不确定在这里尝试什么.我已经读过我需要使用一个名为glob的模块,但是我不确定该怎么做.

I'm not sure what to try out here. I've read that I need to use a module called glob but I'm not sure how to go about it.

此脚本适用于单个文件:

This script works for a single file:

# Import libraries
import pandas as pd
import xlsxwriter

# Set system paths
INPUT_PATH = 'SystemPath//Downloads//'
INPUT_FILE = 'rawData.csv'

OUTPUT_PATH = 'SystemPath//Downloads//Output//'
OUTPUT_FILE = 'rawDataOutput.xlsx'

# Get data
df = pd.read_csv(INPUT_PATH + INPUT_FILE)

# Clean data
cleanedData = df[['State','Campaigns','Type','Start date','Impressions','Clicks','Spend(INR)',
                  'Orders','Sales(INR)','NTB orders','NTB sales']]
cleanedData = cleanedData[cleanedData['Impressions'] != 0].sort_values('Impressions', 
                                                                       ascending= False).reset_index()
cleanedData.loc['Total'] = cleanedData.select_dtypes(pd.np.number).sum()
cleanedData['CTR(%)'] = (cleanedData['Clicks'] / 
                         cleanedData['Impressions']).astype(float).map("{:.2%}".format)
cleanedData['CPC(INR)'] = (cleanedData['Spend(INR)'] / cleanedData['Clicks'])
cleanedData['ACOS(%)'] = (cleanedData['Spend(INR)'] / 
                          cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
cleanedData['% of orders NTB'] = (cleanedData['NTB orders'] / 
                                  cleanedData['Orders']).astype(float).map("{:.2%}".format)
cleanedData['% of sales NTB'] = (cleanedData['NTB sales'] / 
                                 cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
cleanedData = cleanedData[['State','Campaigns','Type','Start date','Impressions','Clicks','CTR(%)',
                           'Spend(INR)','CPC(INR)','Orders','Sales(INR)','ACOS(%)',
                           'NTB orders','% of orders NTB','NTB sales','% of sales NTB']]

# Create summary
summaryData = cleanedData.groupby(['Type'])[['Spend(INR)','Sales(INR)']].agg('sum')
summaryData.loc['Overall Snapshot'] = summaryData.select_dtypes(pd.np.number).sum()
summaryData['ROI'] = summaryData['Sales(INR)'] / summaryData['Spend(INR)']

# Push to excel
writer = pd.ExcelWriter(OUTPUT_PATH + OUTPUT_FILE, engine='xlsxwriter')
summaryData.to_excel(writer, sheet_name='Summary')
cleanedData.to_excel(writer, sheet_name='Overall Report')
writer.save()

我以前从未尝试过类似的方法,感谢您帮助我们弄清楚这一点

I've never tried anything like this before and I would appreciate your help trying to figure this out

推荐答案

您可以使用Python的

You can use Python's glob.glob() to get all of the CSV files from a given folder. For each filename that is returned, you could derive a suitable output filename. The file processing could be moved into a function as follows:

# Import libraries
import pandas as pd
import xlsxwriter
import glob
import os

def process_csv(input_filename, output_filename):
    # Get data
    df = pd.read_csv(input_filename)

    # Clean data
    cleanedData = df[['State','Campaigns','Type','Start date','Impressions','Clicks','Spend(INR)',
                    'Orders','Sales(INR)','NTB orders','NTB sales']]
    cleanedData = cleanedData[cleanedData['Impressions'] != 0].sort_values('Impressions', 
                                                                        ascending= False).reset_index()
    cleanedData.loc['Total'] = cleanedData.select_dtypes(pd.np.number).sum()
    cleanedData['CTR(%)'] = (cleanedData['Clicks'] / 
                            cleanedData['Impressions']).astype(float).map("{:.2%}".format)
    cleanedData['CPC(INR)'] = (cleanedData['Spend(INR)'] / cleanedData['Clicks'])
    cleanedData['ACOS(%)'] = (cleanedData['Spend(INR)'] / 
                            cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
    cleanedData['% of orders NTB'] = (cleanedData['NTB orders'] / 
                                    cleanedData['Orders']).astype(float).map("{:.2%}".format)
    cleanedData['% of sales NTB'] = (cleanedData['NTB sales'] / 
                                    cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
    cleanedData = cleanedData[['State','Campaigns','Type','Start date','Impressions','Clicks','CTR(%)',
                            'Spend(INR)','CPC(INR)','Orders','Sales(INR)','ACOS(%)',
                            'NTB orders','% of orders NTB','NTB sales','% of sales NTB']]

    # Create summary
    summaryData = cleanedData.groupby(['Type'])[['Spend(INR)','Sales(INR)']].agg('sum')
    summaryData.loc['Overall Snapshot'] = summaryData.select_dtypes(pd.np.number).sum()
    summaryData['ROI'] = summaryData['Sales(INR)'] / summaryData['Spend(INR)']

    # Push to excel
    writer = pd.ExcelWriter(output_filename, engine='xlsxwriter')
    summaryData.to_excel(writer, sheet_name='Summary')
    cleanedData.to_excel(writer, sheet_name='Overall Report')
    writer.save()

# Set system paths
INPUT_PATH = 'SystemPath//Downloads//'
OUTPUT_PATH = 'SystemPath//Downloads//Output//'

for csv_filename in glob.glob(os.path.join(INPUT_PATH, "*.csv")):
    name, ext = os.path.splitext(os.path.basename(csv_filename))
    # Create an output filename based on the input filename
    output_filename = os.path.join(OUTPUT_PATH, f"{name}Output.xlsx")
    process_csv(csv_filename, output_filename)

os.path.join() 可以用作将文件路径连接在一起的一种更安全的方式.

os.path.join() can be used as a safer way to join file paths together.

这篇关于循环浏览多个CSV文件并运行脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆