使用Python分割具有多个标头的CSV文件 [英] Use Python to split a CSV file with multiple headers

查看：140 发布时间：2020/7/11 20:55:24 python csv python-3.x

本文介绍了使用Python分割具有多个标头的CSV文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个不断添加的CSV文件.它具有多个标题，并且标题之间唯一的共同之处是第一列始终为"NAME".

I have a CSV file that is being constantly appended. It has multiple headers and the only common thing among the headers is that the first column is always "NAME".

如何将单个CSV文件拆分为单独的CSV文件，每个标题行一个?

How do I split the single CSV file into separate CSV files, one for each header row?

这是一个示例文件:

"NAME","AGE","SEX","WEIGHT","CITY"
"Bob",20,"M",120,"New York"
"Peter",33,"M",220,"Toronto"
"Mary",43,"F",130,"Miami"
"NAME","COUNTRY","SPORT","NUMBER","SPORT","NUMBER"
"Larry","USA","Football",14,"Baseball",22
"Jenny","UK","Rugby",5,"Field Hockey",11
"Jacques","Canada","Hockey",19,"Volleyball",4
"NAME","DRINK","QTY"
"Jesse","Beer",6
"Wendel","Juice",1
"Angela","Milk",3

推荐答案

如果csv文件的大小不是很大-因此所有文件都可以一次存储在内存中-只需使用read()将文件读入一个字符串，然后在此字符串上使用正则表达式:

If the size of the csv files is not huge -- so all can be in memory at once -- just use read() to read the file into a string and then use a regex on this string:

import re

with open(ur_csv) as f:
    data=f.read()
    chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',data,re.S | re.M)
    for i, chunk in enumerate(chunks, 1):
        with open('/path/{}.csv'.format(i), 'w') as fout:
            fout.write(chunk.group(1))

如果需要考虑文件的大小，则可以使用 mmap 创建看起来像一个大字符串但并不同时存在于内存中的东西.

If the size of the file is a concern, you can use mmap to create something that looks like a big string but is not all in memory at the same time.

然后使用带有正则表达式的mmap字符串来分隔csv块，如下所示:

Then use the mmap string with a regex to separate the csv chunks like so:

import mmap
import re

with open(ur_csv) as f:
    mf=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',mf,re.S | re.M)
    for i, chunk in enumerate(chunks, 1):
        with open('/path/{}.csv'.format(i), 'w') as fout:
            fout.write(chunk.group(1))

无论哪种情况，这都会将所有块写入名为1.csv, 2.csv等的文件中.

In either case, this will write all the chunks in files named 1.csv, 2.csv etc.

这篇关于使用Python分割具有多个标头的CSV文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python分割具有多个标头的CSV文件 [英] Use Python to split a CSV file with multiple headers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python分割具有多个标头的CSV文件 [英] Use Python to split a CSV file with multiple headers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭