如何将大型CSV数据文件拆分为单独的数据文件? [英] How to break a large CSV data file into individual data files?

查看:460
本文介绍了如何将大型CSV数据文件拆分为单独的数据文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件,其中第一行包含变量名称,其余行包含数据。什么是一个很好的方法将它分解成文件,每个只包含一个变量在Python?这个解决方案是否可靠?例如。如果输入文件是100G大小怎么办?我试图执行一个分化征服策略,但是Python的新。先感谢您的帮助!



输入文件类似于

  var1,var2,var3 
1,2,hello
2,5,yay
...

我想创建3个(或许多变量)文件var1.csv,var2.csv,var3.csv
,使文件类似于
File1

  var1 
1
2
...



File2

  var2 
2
5
...

File3



  var3 
hello
yay


解决方案

由于列数不是荒谬的你可以在你的平台上一次打开的文件数),行数,从而总的大小,没有什么大不了(当然,因为你有足够的空间在磁盘;-)因为你会一次只处理一列 - 我建议以下代码:

  import csv 

def splitit(inputfilename):
with open(inputfilename,'rb')as inf:
inrd = csv.reader(inf)
names = next(inrd)
outfiles = [open(n +'。csv','wb')for n in names]
ouwr = [csv.writer(w)for w in outfiles]
for w,n in zip(ouwr,names ):
w.writerow([n])
对于inrd中的行:
for w,r in zip(ouwr,row):
ouwr.writerow([r] )
for o in outfiles:o.close()


I have a CSV file the first row of which contains the variables names and the rest of the rows contains the data. What's a good way to break it up into files each containing just one variable in Python? Is this solution going to be robust? E.g. what if the input file is 100G in size? I am trying to perform a divide conquer strategy but is new to Python. Thanks in advance for your help!

The input files looks like

var1,var2,var3
1,2,hello
2,5,yay
...

I want to create 3 (or however many variables) files var1.csv, var2.csv, var3.csv so that files resemble File1

var1
1
2
...

File2

var2
2
5
...

File3

var3
hello
yay

解决方案

As lomg as the number of columns isn't absurdly huge (larger than the number of files you can have open at once on your platform), the number of rows, and thus the total size, are no big deal (as long of course as you have ample free space on disk;-) since you'll be processing just a column at a time -- I suggest the following code:

import csv

def splitit(inputfilename):
  with open(inputfilename, 'rb') as inf:
    inrd = csv.reader(inf)
    names = next(inrd)
    outfiles = [open(n+'.csv', 'wb') for n in names]
    ouwr = [csv.writer(w) for w in outfiles]
    for w, n in zip(ouwr, names):
      w.writerow([n])
    for row in inrd:
      for w, r in zip(ouwr, row):
        ouwr.writerow([r])
    for o in outfiles: o.close()

这篇关于如何将大型CSV数据文件拆分为单独的数据文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆