在10个不同的子目录中合并多个具有相同名称的csv文件 [英] Merge multiple csv files with same name in 10 different subdirectory
问题描述
我有10个不同的子目录,每个目录中的文件名相同(每个目录20个文件),第0列是每个文件的索引列.
i have 10 different subdirectories with same file names in each directory ( 20 files per directory ) and column 0 is the index column in each file.
例如
**strong text**DIRECTORY A
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
**DIRECTORY B**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
**DIRECTORY C**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
Each of the above files contains 6 columns and index_col = 0 with NO
column headers
**DIRECTORY FILES_MERGED**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
我想合并所有文件,每个文件都带有相同名称的子目录 到具有SAME NAME的1个文件中,并将新文件保存在NEW子目录中 例如DIRECTORY FILES_MERGED的INDEX =列0.合并的文件 每个文件只有一个索引列,其中列1,2、3、4、5 每个目录中的名称相同
I want to merge all the files with SAME NAME from EACH subdirectory into 1 file with SAME NAME and save the new file in a NEW subdirectory e.g DIRECTORY FILES_MERGED with INDEX = Column 0. The merged file has only one index column with columns 1,2,3,4,5 from each file with same name from each directory
我已将csv文件读入熊猫数据框
i have read a csv file into a pandas dataframe
df= pd.read_csv(filename, sep=",", header = None, usecols=[0, 1, 2, 3, 4, 5])
这是数据框的格式
我最初的原始数据框:
0 1 2 3 4 5
0 1451606820 1.0862 1.08630 1.08578 1.08578 25
1 1451608800 1.0862 1.08630 1.08578 1.08610 10
2 1451608860 1.0862 1.08620 1.08578 1.08578 16
3 1451610180 1.0862 1.08630 1.08578 1.08578 27
4 1451610480 1.0858 1.08590 1.08560 1.08578 21
5 1451610540 1.0857 1.08578 1.08570 1.08578 2
6 1451610600 1.0857 1.08578 1.08570 1.08578 2
7 1451610720 1.0857 1.08578 1.08570 1.08578 2
8 1451610780 1.0857 1.08578 1.08570 1.08578 2
Column '0' = Datetime in Epoch time
Columns 1,2,3,4,5 are values
推荐答案
有很多方法可以做到这一点,住在熊猫我做了以下工作.
There are many ways to do this, staying in Pandas I did the following.
具有文件结构
root/
├── dir1/
│ ├── data_20170101_k
│ ├── data_20170102_k
│ ├── ...
├── dir2/
│ ├── data_20170101_k
│ └── data_20170101_k
│ └── ...
└── ...
此代码可以正常工作,虽然解释有些冗长,但是您可以通过实施缩短代码.
This code will work, it's a little verbose for explanation but you can shorten with implementation.
import glob
import pandas as pd
CONCAT_DIR = "/FILES_CONCAT/"
# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("root/*/*")], columns=["fullpath"])
# fullpath
# 0 root\dir1\data_20170101_k.csv
# 1 root\dir1\data_20170102_k.csv
# 2 root\dir2\data_20170101_k.csv
# 3 root\dir2\data_20170102_k.csv
# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("\\", 1, expand=True).rename(columns={0: 'path', 1:'filename'})
# path filename
# 0 root\dir1 data_20170101_k.csv
# 1 root\dir1 data_20170102_k.csv
# 2 root\dir2 data_20170101_k.csv
# 3 root\dir2 data_20170102_k.csv
# Join these into one DataFrame
files = files.join(files_split)
# fullpath path filename
# 0 root\dir1\data_20170101_k.csv root\dir1 data_20170101_k.csv
# 1 root\dir1\data_20170102_k.csv root\dir1 data_20170102_k.csv
# 2 root\dir2\data_20170101_k.csv root\dir2 data_20170101_k.csv
# 3 root\dir2\data_20170102_k.csv root\dir2 data_20170102_k.csv
# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
concat_df = pd.concat(dfs) # Concat dataframes into one
concat_df.to_csv(CONCAT_DIR + f) # Save dataframe
这篇关于在10个不同的子目录中合并多个具有相同名称的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!