使用Python分割具有多个标头的CSV文件 [英] Use Python to split a CSV file with multiple headers

查看:140
本文介绍了使用Python分割具有多个标头的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个不断添加的CSV文件.它具有多个标题,并且标题之间唯一的共同之处是第一列始终为"NAME".

I have a CSV file that is being constantly appended. It has multiple headers and the only common thing among the headers is that the first column is always "NAME".

如何将单个CSV文件拆分为单独的CSV文件,每个标题行一个?

How do I split the single CSV file into separate CSV files, one for each header row?

这是一个示例文件:

"NAME","AGE","SEX","WEIGHT","CITY"
"Bob",20,"M",120,"New York"
"Peter",33,"M",220,"Toronto"
"Mary",43,"F",130,"Miami"
"NAME","COUNTRY","SPORT","NUMBER","SPORT","NUMBER"
"Larry","USA","Football",14,"Baseball",22
"Jenny","UK","Rugby",5,"Field Hockey",11
"Jacques","Canada","Hockey",19,"Volleyball",4
"NAME","DRINK","QTY"
"Jesse","Beer",6
"Wendel","Juice",1
"Angela","Milk",3

推荐答案

如果csv文件的大小不是很大-因此所有文件都可以一次存储在内存中-只需使用read()将文件读入一个字符串,然后在此字符串上使用正则表达式:

If the size of the csv files is not huge -- so all can be in memory at once -- just use read() to read the file into a string and then use a regex on this string:

import re

with open(ur_csv) as f:
    data=f.read()
    chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',data,re.S | re.M)
    for i, chunk in enumerate(chunks, 1):
        with open('/path/{}.csv'.format(i), 'w') as fout:
            fout.write(chunk.group(1))

如果需要考虑文件的大小,则可以使用 mmap 创建看起来像一个大字符串但并不同时存在于内存中的东西.

If the size of the file is a concern, you can use mmap to create something that looks like a big string but is not all in memory at the same time.

然后使用带有正则表达式的mmap字符串来分隔csv块,如下所示:

Then use the mmap string with a regex to separate the csv chunks like so:

import mmap
import re

with open(ur_csv) as f:
    mf=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',mf,re.S | re.M)
    for i, chunk in enumerate(chunks, 1):
        with open('/path/{}.csv'.format(i), 'w') as fout:
            fout.write(chunk.group(1))

无论哪种情况,这都会将所有块写入名为1.csv, 2.csv等的文件中.

In either case, this will write all the chunks in files named 1.csv, 2.csv etc.

这篇关于使用Python分割具有多个标头的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆