将行从CSV中的FOR循环合并到列中 [英] Merge rows into Columns from FOR Loop in CSV

查看:45
本文介绍了将行从CSV中的FOR循环合并到列中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该程序扫描目录中的文件(请参见下面的示例数据-一个文件中可能有10个数据示例),并使用REGEX(模式)提取数据.

This Program scans through files ( see sample data below - one file could have 10 data samples in it) in a directory, and extracts data using REGEX (patterns).

我遇到的问题是输出就像:

The problem i am having is the output is like:

# use regex on a directory of files and copy into a CSV
import csv
import glob
import io
import re
import os

# Pattern REGEX configuration
patternN1S1 = "\/N1(.*?)\/S1" # reads between /N11,3280,0000,031,037,014,0198,32,1/S1
patternT5S6 = "\/T5(.*?)\/S6" # reads between /T50230,0485,355,389,----,025,08005/S6
#patternNEXT

path = "/test/"

# CSV headers
header = ['Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6',
          'Column7']  # add more as I expand out the Patterns

with open('RR.csv', 'w', newline='') as csvf:
    writer = csv.writer(csvf)
    writer.writerow(header)

    # to search DIR defined in path
    for files in glob.glob(path + "*.TXT"):
        with open(files) as infile:
            infile = open(files)
            filename = os.path.basename(infile.name)
            data = infile.read()

        pat1 = re.findall(patternN1S1, data)
        pat2 = re.findall(patternT5S6, data)

        rows = [next(csv.reader(io.StringIO(row))) for row in pat1]
        rows1 = [next(csv.reader(io.StringIO(row))) + [filename] for row in pat2]
        #rows2 = [next(csv.reader(io.StringIO(row))) + [filename] for row in pat3]

        writer.writerows(rows)
        writer.writerows(rows1)
        #writer.writerows(rows2)

我想实现的是针对我从REGEX创建的每个模式,它是从文件中提取的,并附加到同一行中.该程序成功地在第一行完成了操作,但是第二个模式被添加为一行,我想将其与 pat1 以及添加的任何其他模式保持在同一行.

What i would like to achieve is for each pattern i make from REGEX, it is extracted from the file(s) and appended to the same row. The program does it successfully with the first row , but the second pattern is added as a row and i would like to keep it in the same row as pat1 , and any other patterns added.

示例文本文件

/C111,49634,7001,04,0000,1,0000,06,1/CE0157,00632,---,---,----,---,C73014/EC000609,48669,14256,35384,00,05,02/EE645000,02173,19871,00767,00/N11,3280,0000,031,037,014,0198,32,1/S10003,0185,0230,0000,999,0161,34/T10147,0240,386,392,----,025,0800D/S20018,0238,0240,0161,016,0157,34/T20152,0250,386,389,--,025,08005/S30040,0300,0790,0198,043,0153,34/T30175,0300,386,390,----,025,08005/S40067,0357,1540,0224,060,0150,35/T40197,0370,371,390,----,025,08005/S50096,0418,2320,0269,080,0148,35/T50230,0485,355,389,----,025,08005/S60109,0446,2620,0294,091,0147,35/T60250,0540,347,390,----,025,08005/S70123,0475,2900,0312,101,0148,35/T70272,0620,339,389,----,025,08005/S80138,0506,3170,0342,109,0152,35/T80297,0695,329,379,----,025,08005/S90151,0523,3390,0362,114,0155,19/T90315,0785,325,379,----,025,08001/S00162,0542,3580,0373,119,0158,25/T00332,0875,325,379,----,025,08001/U10172,0563,3740,0382,123,0162,24/V10350,0950,325,379,----025,08001/U20182,0583,3860,0390,128,0165,28/V20370,1025,323,379,----,026,08001/;

/C111,49634,7001,04,0000,1,0000,06, 1/CE0157,00632,---,---,----,---,C73014/EC000609,48669,14256,35384,00,05,02/EE645000,02173 ,19871,00767,00/N11,3280,0000,031,037,014,0198,32,1/S10003,0185,0230,0000,999,0161,34/T10 147,0240,386,392,----,025,0800D/S20018,0238,0240,0161,016,0157,34/T20152,0250,386,389,--- -,025,08005/S30040,0300,0790,0198,043,0153,34/T30175,0300,386,390,----,025,08005/S40067,0 357,1540,0224,060,0150,35/T40197,0370,371,390,----,025,08005/S50096,0418,2320,0269,080,01 48,35/T50230,0485,355,389,----,025,08005/S60109,0446,2620,0294,091,0147,35/T60250,0540,34 7,390,----,025,08005/S70123,0475,2900,0312,101,0148,35/T70272,0620,339,389,----,025,08005 /S80138,0506,3170,0342,109,0152,35/T80297,0695,329,379,----,025,08005/S90151,0523,3390,03 62,114,0155,19/T90315,0785,325,379,----,025,08001/S00162,0542,3580,0373,119,0158,25/T0033 2,0875,325,379,----,025,08001/U10172,0563,3740,0382,123,0162,24/V10350,0950,325,379,----, 025,08001/U20182,0583,3860,0390,128,0165,28/V20370,1025,323,379,----,026,08001/ ;

感谢任何帮助.

推荐答案

您将需要使用 zip()一起读取两个列表,然后使用 chain.from_iterable()将每一行组合在一起:

You will need to use zip() to read the two lists together and then chain.from_iterable() to combine each row together:

# use regex on a directory of files and copy into a CSV
from itertools import chain
import csv
import glob
import io
import re
import os

# Pattern REGEX configuration
reN1S1 = re.compile("\/N1(.*?)\/S1") # reads between /N11,3280,0000,031,037,014,0198,32,1/S1
reT5S6 = re.compile("\/T5(.*?)\/S6") # reads between /T50230,0485,355,389,----,025,08005/S6
#patternNEXT

path = "/test/"
# CSV headers
header = ['Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6',
          'Column7']  # add more as I expand out the Patterns

with open('RR.csv', 'w', newline='') as csvf:
    writer = csv.writer(csvf)
    writer.writerow(header)

    # to search DIR defined in path
    for file in glob.glob(path + "*.TXT"):
        with open(file) as infile:
            filename = os.path.basename(infile.name)
            data = infile.read()

        rows1 = [next(csv.reader(io.StringIO(row))) for row in reN1S1.findall(data)]
        rows2 = [next(csv.reader(io.StringIO(row))) for row in reT5S6.findall(data)]

        for row in zip(rows1, rows2):
            writer.writerow(list(chain.from_iterable(row)) + [filename])

这假定 rows1 rows2 始终具有相同的行数.

This assumes rows1 and rows2 will always have the same number of rows.

这篇关于将行从CSV中的FOR循环合并到列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆