如何在条件下将csv文件合并到单个文件并将文件名添加为列? [英] How to merge csv files onto a single file on condition and adding file name as a column?

查看:279
本文介绍了如何在条件下将csv文件合并到单个文件并将文件名添加为列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的文件夹中有多个csv文件.列标题不同,但列数据相同.

I have multiple csv files on folder. The column headers are different but column datas are same.

括号内的数字是实际的列名.项目(67)67是列名

The number inside the bracket is actual column name. Item(67) 67 is column name

因此忽略字符串Item,仅考虑()中的int并执行操作.

So ignore the string Item and only consider the int inside () and perform the operation.

样本文件: https://drive.google.com/open?id = 1q7c1AqCRKRufSVh--9o0W6rdz28QyBGa

说明:

驱动器上的文件应附加在一起.基于列名称的启用条件".如果条件上的整数与列名(列名()内的整数)匹配,则应将其放在该列上.请检查预期的输出.
文件

The files on the drive should be appended together. On Condition based on the column names. If the integer on condition matches with the column name(integer inside () of the column name) then it should be placed on that column. Please check the expected output.
Files

File1: ID Item(67) Item (89) Item (91) Item (100)
       1    56      78        98        101     
       2    91      100       121       
File2: ID Item(96) Item (58) Item (99) Item (105)
       3  101      102        103       104
       4  112      113        117       119

条件

d ={
    'File':['File1','File2'],
     'Price1':[67,67],
     'Price2':[89,67],
     'Price3':[91,67],
    'Price4':[100,91]
}
Condition=pd.DataFrame(data=d)
Condition

预期输出:

  File  ID   Price1 Price2 Price3 Price4
  File1  1    56      78    98     101     
  File1  2    91      100   121
  File2  3    101     102  104     103       
  File2  4     112      113  119      117  

推荐答案

使用:

files = glob.glob('shelldemo/*.csv')

dfs = []
for fp in files:
    #if multiple columns with no ()  
    #df = pd.read_csv(fp, index_col=['S.no','id','number'])

    df = pd.read_csv(fp, index_col=['ID'])
    df['file'] = os.path.basename(fp).split('.')[0]
    df = df.set_index('file', append=True)
    df.columns = df.columns.str.extract('\((\d+)\)', expand=False).astype(int)
    dfs.append(df)


df1 = pd.concat(dfs, sort=False).reset_index()
print (df1)
   ID   file     58   67     89     91     96    100
0   1  file1    NaN   56   78.0   98.0    NaN  101.0
1   2  file1    NaN   91  100.0  121.0    NaN    NaN
2   3  file2  102.0  103    NaN    NaN  101.0  104.0
3   4  file2  113.0  117    NaN    NaN  112.0  119.0


print (df2)
    File  Price1  Price2  Price3  Price4
0  File1      67      89      91     100
1  File2      96      58     105      99


df2.columns = df2.columns.str.lower() 
df2['file'] = df2['file'].str.lower()

#merge data together by left join 
df = df1.merge(df2, on='file', how='left')
print (df)
   ID   file     58   67     89     91     96    100  price1  price2  price3  \
0   1  file1    NaN   56   78.0   98.0    NaN  101.0      67      89      91   
1   2  file1    NaN   91  100.0  121.0    NaN    NaN      67      89      91   
2   3  file2  102.0  103    NaN    NaN  101.0  104.0      96      58     105   
3   4  file2  113.0  117    NaN    NaN  112.0  119.0      96      58     105   

   price4  
0     100  
1     100  
2      99  
3      99  


#filter integers between ()
df1 = df.loc[:, df.columns.str.isnumeric().isnull()].copy()
#filter all columns with price
df2 = df.filter(regex='price').copy()

uniq_vals_df2 = df2.stack().dropna().drop_duplicates()
not_matched_vals = np.setdiff1d(uniq_vals_df2, df1.columns)
df1 = df1.join(pd.DataFrame(columns=not_matched_vals.tolist() + ['a']))

#replace columns by match values from df2
for c in df2.columns:
    df2[c] = df1.lookup(df1.index, df2[c].fillna('a'))
#join to original DataFrame    
df = df[['file','ID']].join(df2)


print (df)

    file  ID  price1  price2  price3  price4
0  file1   1    56.0    78.0    98.0   101.0
1  file1   2    91.0   100.0   121.0     NaN
2  file2   3   101.0   102.0     NaN     NaN
3  file2   4   112.0   113.0     NaN     NaN

这篇关于如何在条件下将csv文件合并到单个文件并将文件名添加为列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆