将行从CSV中的FOR循环合并到列中 [英] Merge rows into Columns from FOR Loop in CSV
问题描述
该程序扫描目录中的文件(请参见下面的示例数据-一个文件中可能有10个数据示例),并使用REGEX(模式)提取数据.
This Program scans through files ( see sample data below - one file could have 10 data samples in it) in a directory, and extracts data using REGEX (patterns).
我遇到的问题是输出就像:
The problem i am having is the output is like:
# use regex on a directory of files and copy into a CSV
import csv
import glob
import io
import re
import os
# Pattern REGEX configuration
patternN1S1 = "\/N1(.*?)\/S1" # reads between /N11,3280,0000,031,037,014,0198,32,1/S1
patternT5S6 = "\/T5(.*?)\/S6" # reads between /T50230,0485,355,389,----,025,08005/S6
#patternNEXT
path = "/test/"
# CSV headers
header = ['Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6',
'Column7'] # add more as I expand out the Patterns
with open('RR.csv', 'w', newline='') as csvf:
writer = csv.writer(csvf)
writer.writerow(header)
# to search DIR defined in path
for files in glob.glob(path + "*.TXT"):
with open(files) as infile:
infile = open(files)
filename = os.path.basename(infile.name)
data = infile.read()
pat1 = re.findall(patternN1S1, data)
pat2 = re.findall(patternT5S6, data)
rows = [next(csv.reader(io.StringIO(row))) for row in pat1]
rows1 = [next(csv.reader(io.StringIO(row))) + [filename] for row in pat2]
#rows2 = [next(csv.reader(io.StringIO(row))) + [filename] for row in pat3]
writer.writerows(rows)
writer.writerows(rows1)
#writer.writerows(rows2)
我想实现的是针对我从REGEX创建的每个模式,它是从文件中提取的,并附加到同一行中.该程序成功地在第一行完成了操作,但是第二个模式被添加为一行,我想将其与 pat1
以及添加的任何其他模式保持在同一行.
What i would like to achieve is for each pattern i make from REGEX, it is extracted from the file(s) and appended to the same row. The program does it successfully with the first row , but the second pattern is added as a row and i would like to keep it in the same row as pat1
, and any other patterns added.
示例文本文件
/C111,49634,7001,04,0000,1,0000,06,1/CE0157,00632,---,---,----,---,C73014/EC000609,48669,14256,35384,00,05,02/EE645000,02173,19871,00767,00/N11,3280,0000,031,037,014,0198,32,1/S10003,0185,0230,0000,999,0161,34/T10147,0240,386,392,----,025,0800D/S20018,0238,0240,0161,016,0157,34/T20152,0250,386,389,--,025,08005/S30040,0300,0790,0198,043,0153,34/T30175,0300,386,390,----,025,08005/S40067,0357,1540,0224,060,0150,35/T40197,0370,371,390,----,025,08005/S50096,0418,2320,0269,080,0148,35/T50230,0485,355,389,----,025,08005/S60109,0446,2620,0294,091,0147,35/T60250,0540,347,390,----,025,08005/S70123,0475,2900,0312,101,0148,35/T70272,0620,339,389,----,025,08005/S80138,0506,3170,0342,109,0152,35/T80297,0695,329,379,----,025,08005/S90151,0523,3390,0362,114,0155,19/T90315,0785,325,379,----,025,08001/S00162,0542,3580,0373,119,0158,25/T00332,0875,325,379,----,025,08001/U10172,0563,3740,0382,123,0162,24/V10350,0950,325,379,----025,08001/U20182,0583,3860,0390,128,0165,28/V20370,1025,323,379,----,026,08001/;
/C111,49634,7001,04,0000,1,0000,06, 1/CE0157,00632,---,---,----,---,C73014/EC000609,48669,14256,35384,00,05,02/EE645000,02173 ,19871,00767,00/N11,3280,0000,031,037,014,0198,32,1/S10003,0185,0230,0000,999,0161,34/T10 147,0240,386,392,----,025,0800D/S20018,0238,0240,0161,016,0157,34/T20152,0250,386,389,--- -,025,08005/S30040,0300,0790,0198,043,0153,34/T30175,0300,386,390,----,025,08005/S40067,0 357,1540,0224,060,0150,35/T40197,0370,371,390,----,025,08005/S50096,0418,2320,0269,080,01 48,35/T50230,0485,355,389,----,025,08005/S60109,0446,2620,0294,091,0147,35/T60250,0540,34 7,390,----,025,08005/S70123,0475,2900,0312,101,0148,35/T70272,0620,339,389,----,025,08005 /S80138,0506,3170,0342,109,0152,35/T80297,0695,329,379,----,025,08005/S90151,0523,3390,03 62,114,0155,19/T90315,0785,325,379,----,025,08001/S00162,0542,3580,0373,119,0158,25/T0033 2,0875,325,379,----,025,08001/U10172,0563,3740,0382,123,0162,24/V10350,0950,325,379,----, 025,08001/U20182,0583,3860,0390,128,0165,28/V20370,1025,323,379,----,026,08001/ ;
感谢任何帮助.
推荐答案
您将需要使用 zip()
一起读取两个列表,然后使用 chain.from_iterable()
将每一行组合在一起:
You will need to use zip()
to read the two lists together and then chain.from_iterable()
to combine each row together:
# use regex on a directory of files and copy into a CSV
from itertools import chain
import csv
import glob
import io
import re
import os
# Pattern REGEX configuration
reN1S1 = re.compile("\/N1(.*?)\/S1") # reads between /N11,3280,0000,031,037,014,0198,32,1/S1
reT5S6 = re.compile("\/T5(.*?)\/S6") # reads between /T50230,0485,355,389,----,025,08005/S6
#patternNEXT
path = "/test/"
# CSV headers
header = ['Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6',
'Column7'] # add more as I expand out the Patterns
with open('RR.csv', 'w', newline='') as csvf:
writer = csv.writer(csvf)
writer.writerow(header)
# to search DIR defined in path
for file in glob.glob(path + "*.TXT"):
with open(file) as infile:
filename = os.path.basename(infile.name)
data = infile.read()
rows1 = [next(csv.reader(io.StringIO(row))) for row in reN1S1.findall(data)]
rows2 = [next(csv.reader(io.StringIO(row))) for row in reT5S6.findall(data)]
for row in zip(rows1, rows2):
writer.writerow(list(chain.from_iterable(row)) + [filename])
这假定 rows1
和 rows2
始终具有相同的行数.
This assumes rows1
and rows2
will always have the same number of rows.
这篇关于将行从CSV中的FOR循环合并到列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!