在Python中用逗号和引号分隔字段? [英] Separate fields by comma and quotes in Python?

查看:252
本文介绍了在Python中用逗号和引号分隔字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将此csv文件分成2D列表。我的代码的问题目前是,它切断了几个字段与数据中的引号行。有引号表示逗号之间不是逗号分隔的字段的一部分,实际上是字段的一部分。我发布了代码,示例数据和示例输出。你可以看到第一个输出行如何跳过几个字段相比,其余的,因为引号。我需要做什么正则表达式行?感谢您提供任何帮助。

I'm trying to separate this csv file into a 2D list. The problem with my code currently is that it cuts off a few fields on lines with quotes in the data. There are quotes there to signify that the comma within is not part of the comma separation of fields and is actually part of the field. I posted the code, example data, and example output. You can see how the first output line skips a few fields compared to the rest because of the quotes. What do I need to do with the regular expression line? Thanks for any help in advance.

下面是一段代码:

import sys
import re
import time

# get the date
date = time.strftime("%x")


# function for reading in each line of file
# returns array of each line
def readIn(file):
    array = []
    for line in file:
        array.append(line)
    return array


def main():
    data = open(sys.argv[1], "r")
    template = open(sys.argv[2], "r")
    output = open(sys.argv[3], "w")

    finalL = []

    dataL = []
    dataL = readIn(data)

    templateL = []
    templateL = readIn(template)

    costY = 0
    dateStr = ""

    # split each line in the data by the comma unless there are quotes
    for i in range(0, len(dataL)):
        if '"' in dataL[i]:
            Pattern = re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')
            dataL[i] = Pattern.split(dataL[i])[1::2]
            for j in range(0, len(dataL[i])):
                dataL[i][j] = dataL[i][j].strip()
        else:       
            temp = dataL[i].strip().split(",")
            dataL[i] = temp

数据示例:

OrgLevel3: ATHLET ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
ATHLET Direct,,,"1,312 ",,62:58:18,130.62 ,,
,,,,,,,,
Grand Total for ATHLET:,,,"1,312 ",,62:58:18,130.62 ,,
,,,,,,,,
OrgLevel3: BOOK ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
BOOK Direct,,,434 ,,14:59:18,28.09 ,,
,,,,,,,,
Grand Total for BOOK:,,,434 ,,14:59:18,28.09 ,,
,,,,,,,,
OrgLevel3: CARD ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
CARD Direct,,,253 ,,09:02:54,14.30 ,,
,,,,,,,,
Grand Total for CARD:,,,253 ,,09:02:54,14.30 ,,

输出示例

['Grand Total for ATHLET:', '"1,312 "', '62:58:18', '130.62', '']
['Grand Total for BOOK:', '', '', '434 ', '', '14:59:18', '28.09 ', '', '']
['Grand Total for CARD:', '', '', '253 ', '', '09:02:54', '14.30 ', '', '']


推荐答案

如果您要将CSV加载到列表中,那么您的整个代码就是:

If you're trying to load a CSV into a list then your entire code to do so is:

import csv

with open(sys.argv[1]) as data:
    dataL = list(csv.reader(data))

如果您的示例数据是您的输入数据, ,例如:

If your example data is your input data, then it needs other work before hand..., eg:

dataL = [row for row in csv.reader(data) if row[0].startswith('Grand Total for')]

这篇关于在Python中用逗号和引号分隔字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆