如何在条件下将csv文件合并到单个文件并将文件名添加为列? [英] How to merge csv files onto a single file on condition and adding file name as a column?
问题描述
我的文件夹中有多个csv文件.列标题不同,但列数据相同.
I have multiple csv files on folder. The column headers are different but column datas are same.
括号内的数字是实际的列名.项目(67)67是列名
The number inside the bracket is actual column name. Item(67) 67 is column name
因此忽略字符串Item,仅考虑()中的int并执行操作.
So ignore the string Item and only consider the int inside () and perform the operation.
样本文件: https://drive.google.com/open?id = 1q7c1AqCRKRufSVh--9o0W6rdz28QyBGa
说明:
驱动器上的文件应附加在一起.基于列名称的启用条件".如果条件上的整数与列名(列名()内的整数)匹配,则应将其放在该列上.请检查预期的输出.
文件
The files on the drive should be appended together. On Condition based on the column names. If the integer on condition matches with the column name(integer inside () of the column name) then it should be placed on that column. Please check the expected output.
Files
File1: ID Item(67) Item (89) Item (91) Item (100)
1 56 78 98 101
2 91 100 121
File2: ID Item(96) Item (58) Item (99) Item (105)
3 101 102 103 104
4 112 113 117 119
条件
d ={
'File':['File1','File2'],
'Price1':[67,67],
'Price2':[89,67],
'Price3':[91,67],
'Price4':[100,91]
}
Condition=pd.DataFrame(data=d)
Condition
预期输出:
File ID Price1 Price2 Price3 Price4
File1 1 56 78 98 101
File1 2 91 100 121
File2 3 101 102 104 103
File2 4 112 113 119 117
推荐答案
使用:
files = glob.glob('shelldemo/*.csv')
dfs = []
for fp in files:
#if multiple columns with no ()
#df = pd.read_csv(fp, index_col=['S.no','id','number'])
df = pd.read_csv(fp, index_col=['ID'])
df['file'] = os.path.basename(fp).split('.')[0]
df = df.set_index('file', append=True)
df.columns = df.columns.str.extract('\((\d+)\)', expand=False).astype(int)
dfs.append(df)
df1 = pd.concat(dfs, sort=False).reset_index()
print (df1)
ID file 58 67 89 91 96 100
0 1 file1 NaN 56 78.0 98.0 NaN 101.0
1 2 file1 NaN 91 100.0 121.0 NaN NaN
2 3 file2 102.0 103 NaN NaN 101.0 104.0
3 4 file2 113.0 117 NaN NaN 112.0 119.0
print (df2)
File Price1 Price2 Price3 Price4
0 File1 67 89 91 100
1 File2 96 58 105 99
df2.columns = df2.columns.str.lower()
df2['file'] = df2['file'].str.lower()
#merge data together by left join
df = df1.merge(df2, on='file', how='left')
print (df)
ID file 58 67 89 91 96 100 price1 price2 price3 \
0 1 file1 NaN 56 78.0 98.0 NaN 101.0 67 89 91
1 2 file1 NaN 91 100.0 121.0 NaN NaN 67 89 91
2 3 file2 102.0 103 NaN NaN 101.0 104.0 96 58 105
3 4 file2 113.0 117 NaN NaN 112.0 119.0 96 58 105
price4
0 100
1 100
2 99
3 99
#filter integers between ()
df1 = df.loc[:, df.columns.str.isnumeric().isnull()].copy()
#filter all columns with price
df2 = df.filter(regex='price').copy()
uniq_vals_df2 = df2.stack().dropna().drop_duplicates()
not_matched_vals = np.setdiff1d(uniq_vals_df2, df1.columns)
df1 = df1.join(pd.DataFrame(columns=not_matched_vals.tolist() + ['a']))
#replace columns by match values from df2
for c in df2.columns:
df2[c] = df1.lookup(df1.index, df2[c].fillna('a'))
#join to original DataFrame
df = df[['file','ID']].join(df2)
print (df)
file ID price1 price2 price3 price4
0 file1 1 56.0 78.0 98.0 101.0
1 file1 2 91.0 100.0 121.0 NaN
2 file2 3 101.0 102.0 NaN NaN
3 file2 4 112.0 113.0 NaN NaN
这篇关于如何在条件下将csv文件合并到单个文件并将文件名添加为列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!