通过列数据将CSV文件拆分为多个文件 [英] Split CSV file into multiple files by column-data

查看:138
本文介绍了通过列数据将CSV文件拆分为多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据其内容拆分"source.csv"文件.当然,这不仅是简单的拆分,而且我还需要填写一些规则".

I want to split a "source.csv" file based on it's contents. But of course it's not just simple splitting, but I need to fullfill some "rules".

  1. 源文件和目标文件具有固定的标头,这些标头没有正确的csv格式.
  2. 新文件是根据其中一列中的特殊数据命名的(在我的示例中为"Fruit"列)
  3. 它必须独立于操作系统.
  4. source-csv文件可包含约500.000+行和至少30+列.(科学数据)所以我不确定将整个数据保存在ram中,还是仅逐行从文件中读取并直接基于"Fruit"(水果)-列的值分类大约500多个文件是否很好.

这样的事情会很棒:

Author: Somebody
Date: Christmas
Project-Title: 42
Name, Fruit, Blubb, Drobblwubb
Anton, Apple, 234, NewYork
Bettina, Banana, 234, Chicago
Carolin, Apple, 123, Berlin

目标文件1:out/destination_apple.csv

Author: Somebody
Date: Christmas
Project-Title: 42
Name, Fruit, Blubb, Drobblwubb
Anton, Apple, 234, NewYork
Carolin, Apple, 123, Berlin

目标文件2:out/destination_banana.csv

Author: Somebody
Date: Christmas
Project-Title: 42
Name, Fruit, Blubb, Drobblwubb
Bettina, Banana, 234, Chicago

我正在做很多实验,但是还没有任何真正的"pythonic"甚至是有效的代码:/.

I'm experimenting quite a lot, but haven't got any really "pythonic" or even working code yet :/.

推荐答案

您是否拥有python pandas 模块?这是一个很好的数据处理模块,将对您有很大帮助.这样的事情可以在途中为您提供帮助:

Do you have the python pandas module? It is a great module for data processing and will help you very much. Something like this can help you on the way:

import pandas

csv = pandas.read_csv('test.csv', sep=',', header=3, skipinitialspace=True)
# header=3 because your header is on the third line
# skipinitialspace is set to True because your example data has spaces after commas
csv_apples = csv[csv['Fruit'] == 'Apple']
csv_bananas = csv[csv['Fruit'] == 'Banana']

csv_apples.to_csv('apples.csv', index=False, sep=',')
csv_bananas.to_csv('bananas.csv', index=False, sep=',')

此示例未将原始csv的前3行写入生成的csv.您可以单独读取csv的前三行,并使用 header = yourheader

This example does not write the first 3 lines of your original csv to the resulting csv. You can implement this reading the first three lines the csv separately and passing it in the to_csv function with header=yourheader

这篇关于通过列数据将CSV文件拆分为多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆