通过列数据将CSV文件拆分为多个文件 [英] Split CSV file into multiple files by column-data
问题描述
我想根据其内容拆分"source.csv"文件.当然,这不仅是简单的拆分,而且我还需要填写一些规则".
I want to split a "source.csv" file based on it's contents. But of course it's not just simple splitting, but I need to fullfill some "rules".
- 源文件和目标文件具有固定的标头,这些标头没有正确的csv格式.
- 新文件是根据其中一列中的特殊数据命名的(在我的示例中为"Fruit"列)
- 它必须独立于操作系统.
- source-csv文件可包含约500.000+行和至少30+列.(科学数据)所以我不确定将整个数据保存在ram中,还是仅逐行从文件中读取并直接基于"Fruit"(水果)-列的值分类大约500多个文件是否很好.
这样的事情会很棒:
Author: Somebody
Date: Christmas
Project-Title: 42
Name, Fruit, Blubb, Drobblwubb
Anton, Apple, 234, NewYork
Bettina, Banana, 234, Chicago
Carolin, Apple, 123, Berlin
目标文件1:out/destination_apple.csv
Author: Somebody
Date: Christmas
Project-Title: 42
Name, Fruit, Blubb, Drobblwubb
Anton, Apple, 234, NewYork
Carolin, Apple, 123, Berlin
目标文件2:out/destination_banana.csv
Author: Somebody
Date: Christmas
Project-Title: 42
Name, Fruit, Blubb, Drobblwubb
Bettina, Banana, 234, Chicago
我正在做很多实验,但是还没有任何真正的"pythonic"甚至是有效的代码:/.
I'm experimenting quite a lot, but haven't got any really "pythonic" or even working code yet :/.
推荐答案
您是否拥有python pandas
模块?这是一个很好的数据处理模块,将对您有很大帮助.这样的事情可以在途中为您提供帮助:
Do you have the python pandas
module? It is a great module for data processing and will help you very much. Something like this can help you on the way:
import pandas
csv = pandas.read_csv('test.csv', sep=',', header=3, skipinitialspace=True)
# header=3 because your header is on the third line
# skipinitialspace is set to True because your example data has spaces after commas
csv_apples = csv[csv['Fruit'] == 'Apple']
csv_bananas = csv[csv['Fruit'] == 'Banana']
csv_apples.to_csv('apples.csv', index=False, sep=',')
csv_bananas.to_csv('bananas.csv', index=False, sep=',')
此示例未将原始csv的前3行写入生成的csv.您可以单独读取csv的前三行,并使用 header = yourheader
This example does not write the first 3 lines of your original csv to the resulting csv. You can implement this reading the first three lines the csv separately and passing it in the to_csv
function with header=yourheader
这篇关于通过列数据将CSV文件拆分为多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!