在csv文件中添加新列，并对on记录进行操作 [英] Add new column in a csv file and manipulate on the on records

查看：219 发布时间：2017/2/24 22:23:22 python csv add

本文介绍了在csv文件中添加新列，并对on记录进行操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有4个csv文件，名为PV.csv，Dwel.csv，Sess.csv和Elap.csv。我有15列和arouind 2000行在每个文件。首先，我想在每个文件中添加一个名为Var的新列，并使用相同的文件名填充新列的单元格。因此，PV.csv文件中的新列'Var'将被PV填充。同样是其他3个文件。
之后，我想操作所有的文件如下。

最后，我想根据A_ID和B_ID合并/加入这4个文件并将记录写入新的csv文件名finalFile.csv。
任何建议和帮助是值得赞赏的。

 < p> PV.csv如下：< / p> ;

  
 A_ID B_ID LO UP LO UP 
 103 321 0 402 
 103 503 192 225 433 608 
 106 264 104 258 334 408 
 107 197 6 32 113 258

Dwell.csv如下：

  
 A_ID B_ID LO UP LO UP 
 103 321 40 250 517 780 
 103 503 80 125 435 585 
 106 264 192 525 682 
 107 197 324 492 542 614

Session.csv如下：

  
 A_ID B_ID LO UP LO UP 
 103 321 75 350 370 850 
 106 264 92 225 482 608 
 107 197 24 92 142

Elapsed.csv如下：

  
 A_ID B_ID LO UP LO UP 
 103 321 5 35 75 
 103 503 100 225 333 408 
 106 264 102 325 582 
 107 197 24 92 142 214

PV.csv的第一个输出文件如下：

 
 
 相同的方式，所有其余的三个文件将被填充以ehrer文件名，Dwell，Session和Elapsed的新列：
  
 A_ID B_ID Var LO UP LO UP 
 103 321 PV 0 402 
 103 503 PV 192 225 433 608 
 106 264 PV 104 258 334 408 
 107 197 PV 6 32 113 258 
  
最终输出文件如下： p> 
 
 
 finalFile.csv。
  
 A_ID B_ID 
 103 321 PV 0 402 
 103 321 Dwel 40 250 
 103 321 Dwel 251 517 
 103 321 Dwel 518 780 
 103 321 Sess 75 350 
 103 321 Sess 351 370 
 103 321 Sess 371 850 
 103 321 Elap 5 35 
 103 321 Elap 36 75 
 103 503 PV 192 225 
 103 503 PV 226 433 
 103 503 PV 434 608 
 103 503 Dwel 80 125 
 103 503 Dwel 126 435 
 103 503 Dwel 436 585 
 103 503 Elap 100 225 
 103 503 Elap 226 333 
 103 503 Elap 334 408 
 106 264 PV 104 258 
 106 264 PV 259 334 
 106 264 PV 335 408 
 106 264 Dwel 192 525 
 106 264 Dwel 526 682 
 106 264 Sess 92 225 
 106 264 Sess 226 482 
 106 264 Sess 483 608 
 106 264 Elap 102 325 
 106 264 Elap 326 582 
 107 197 PV 6 32 
 107 192 PV 33 113 
 107 192 PV 114 258 
 107 192 Dwel 324 492 
 107 192 Dwel 493 542 
 107 192 Dwel 543 614 
 107 192 Sess 24 92 
 107 192 Sess 93 142 
 107 192 Elap 24 92 
 107 192 Elap 93 142 
 107 192 Elap 143 214 
  
 
 
解决方案
 p> 
 
 
 从集合import defaultdict 
来自itertools import groupby 
 import csv 
 
 entries = defaultdict（list）
 csv_files = [（0，'PV.csv'，'PV'），（1，'Dwell.csv'，'Dwel'），（2，'Session.csv'，'Sess '），（3，'Elapsed.csv'，'Elap'）] 
 
索引，文件名，短名称在csv_files中：
 f_input = open（filename，'rb'）
 csv_input = csv.reader（f_input）
 header = next（csv_input）
 
对于csv_input中的行：
 row [：] = [col ] 
 
 
 f_input.close（）
 
 
 
 
 
 
 $ [ b 
 f_output = open（'finalFile.csv'，'wb'）
 csv_output = csv.writer（f_output）
 csv_output.writerow（header [：2] + ['Var'] + header [2：4]）
 
用于排序（entries.keys（））中的键：
用于groupby中的k，g x：x [1]）：
 var_group = list（g）
 if len（var_group [0] [2]）：
 up = var_group [0] [2] 
用于在var_group中输入：
用于zip中的对（* [iter（entry [2]）] * 2）：
 csv_output.writerow（[key [0]，key [1] ，entry [1]，up，pair [1]]）
 up = int（pair [1]）+ 1 
 
 f_output.close（）
  
使用您提供的数据，输出以下结果：
 A_ID，B_ID，Var，LO，UP 
 103,321，PV，0,402 
 103,321，Dwel，40,250 
 103,321，Dwel，251,780 
 103,321，Sess，75,350 
 103,321，Sess，351,850 
 103,321，Elap，5,35 
 103,503，PV，192,225 
 103,503，PV，226,608 
 103,503，Dwel，80,125 
 103,503， Dwel，126,585 
 103,503，Elap，100,225 
 103,503，Elap，226,408 
 106,264，PV，104,258 
 106,264，PV，259,408 
 106,264，Dwel，192,525 
 106,264，Sess，92,225 
 106,264，Sess，226,608 
 106,264，Elap，102,325 
 107,197，PV，6,32 
 107,197，PV，33,258 
 107,197 ，Dwel，324,492 
 107,197，Dwel，493,614 
 107,197，Sess，24,92 
 107,197，Elap，24,92 
 107,197，Elap，93,214 
要处理文件夹中的所有csv文件，您可以将以下内容添加到脚本的顶部：
  import os 
 import glob 
 
 csv_files = [（index，file，os.path.splitext（file）[0]）for index，file in enumerate （glob.glob（'*。csv'））] 
  
输出文件，否则将在下次运行脚本时读取。
 
 
 使用Python 2.6.6测试（我相信是OP使用的） p> 
I have 4 csv files named PV.csv, Dwel.csv, Sess.csv, and Elap.csv. I have 15 columns and arouind 2000 rows in each file. At first I would like to add a new column named Var in each file and fill up the cells of the new column with the same file name. Therefore, new column 'Var' in PV.csv file will filled up by PV. Same is for other 3 files.
After that I would like to manipulate the all the files as follows.

Finally I  would like to merge / join these 4 files based on the  A_ID and B_ID and write records into a new csv file name finalFile.csv.
Any suggestion and help is appreciated.
<p>PV.csv is as follows:</p>


   
A_ID      B_ID       LO       UP     LO      UP
103       321        0        402    
103       503        192      225    433     608   
106       264        104      258    334     408
107       197        6        32     113     258    
Dwell.csv is as follows:
   
A_ID      B_ID       LO       UP     LO      UP  
103       321        40       250    517     780
103       503        80       125    435     585     
106       264        192      525    682  
107       197        324      492    542     614    
Session.csv is as follows:
   
A_ID      B_ID       LO       UP     LO      UP 
103       321        75       350    370     850     
106       264        92       225    482     608  
107       197        24       92     142    
Elapsed.csv is as follows:
   
A_ID      B_ID       LO       UP     LO      UP 
103       321        5        35     75
103       503        100      225    333     408      
106       264        102      325    582  
107       197        24       92     142     214    
First output file of PV.csv will be as follows:

Same way all rest of three files will be filled up with new column with ehrer file name, Dwell, Session, and Elapsed:
   
A_ID    B_ID      Var   LO        UP     LO      UP
103     321       PV    0         402    
103     503       PV    192       225    433     608   
106     264       PV    104       258    334     408
107     197       PV    6         32     113     258 
Final output file will be as follows:

finalFile.csv.
   
A_ID    B_ID      Var    LO        UP
103     321       PV     0         402
103     321       Dwel   40        250
103     321       Dwel   251       517
103     321       Dwel   518       780
103     321       Sess   75        350
103     321       Sess   351       370
103     321       Sess   371       850
103     321       Elap   5         35
103     321       Elap   36        75
103     503       PV     192       225
103     503       PV     226       433
103     503       PV     434       608
103     503       Dwel   80        125
103     503       Dwel   126       435
103     503       Dwel   436       585
103     503       Elap   100       225
103     503       Elap   226       333
103     503       Elap   334       408
106     264       PV     104       258
106     264       PV     259       334
106     264       PV     335       408
106     264       Dwel   192       525
106     264       Dwel   526       682
106     264       Sess   92        225
106     264       Sess   226       482
106     264       Sess   483       608
106     264       Elap   102       325
106     264       Elap   326       582
107     197       PV     6         32
107     192       PV     33        113
107     192       PV     114       258
107     192       Dwel   324       492
107     192       Dwel   493       542
107     192       Dwel   543       614
107     192       Sess   24        92
107     192       Sess   93        142
107     192       Elap   24        92
107     192       Elap   93        142
107     192       Elap   143       214
    
 解决方案 
The following script should get you started:
from collections import defaultdict
from itertools import groupby
import csv

entries = defaultdict(list)
csv_files = [(0, 'PV.csv', 'PV'), (1, 'Dwell.csv', 'Dwel'), (2, 'Session.csv', 'Sess'), (3, 'Elapsed.csv', 'Elap')]

for index, filename, shortname in csv_files:
    f_input = open(filename, 'rb')
    csv_input = csv.reader(f_input)
    header = next(csv_input)

    for row in csv_input:
        row[:] = [col for col in row if col]    
        entries[(row[0], row[1])].append((index, shortname, row[2:]))

    f_input.close()

f_output = open('finalFile.csv', 'wb')
csv_output = csv.writer(f_output)
csv_output.writerow(header[:2] + ['Var'] + header[2:4])

for key in sorted(entries.keys()):
    for k, g in groupby(sorted(entries[key]), key=lambda x: x[1]):
        var_group = list(g)
        if len(var_group[0][2]):
            up = var_group[0][2][0]
            for entry in var_group:
                for pair in zip(*[iter(entry[2])]*2):
                    csv_output.writerow([key[0], key[1], entry[1], up, pair[1]])
                    up = int(pair[1]) + 1

f_output.close()
Using the data you have provided, this gives the following output:
A_ID,B_ID,Var,LO,UP
103,321,PV,0,402
103,321,Dwel,40,250
103,321,Dwel,251,780
103,321,Sess,75,350
103,321,Sess,351,850
103,321,Elap,5,35
103,503,PV,192,225
103,503,PV,226,608
103,503,Dwel,80,125
103,503,Dwel,126,585
103,503,Elap,100,225
103,503,Elap,226,408
106,264,PV,104,258
106,264,PV,259,408
106,264,Dwel,192,525
106,264,Sess,92,225
106,264,Sess,226,608
106,264,Elap,102,325
107,197,PV,6,32
107,197,PV,33,258
107,197,Dwel,324,492
107,197,Dwel,493,614
107,197,Sess,24,92
107,197,Elap,24,92
107,197,Elap,93,214
To work with all csv files in a folder, you could add the following to the top of the script:
import os
import glob

csv_files = [(index, file, os.path.splitext(file)[0]) for index, file in enumerate(glob.glob('*.csv'))]
You should also change the location of the output file otherwise it will be read in the next time the script is run.

Tested using Python 2.6.6 (which I believe is what the OP is using)

                        这篇关于在csv文件中添加新列，并对on记录进行操作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在csv文件中添加新列，并对on记录进行操作 [英] Add new column in a csv file and manipulate on the on records

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在csv文件中添加新列，并对on记录进行操作 [英] Add new column in a csv file and manipulate on the on records

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭