将CSV文件拆分为较小的文件,但保留标题吗? [英] Split CSV files into smaller files but keeping the headers?

查看:125
本文介绍了将CSV文件拆分为较小的文件,但保留标题吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的CSV文件,行数为1m.我想知道是否有办法将此文件拆分为较小的文件,但在所有文件上保留第一行(CSV标头).

I have a huge CSV file, 1m lines. I was wondering if there is a way to split this file into smaller ones but keeping the first line (CSV header) on all the files.

split似乎非常快,但也非常有限.您不能将后缀添加到.csv之类的文件名中.

It seems split is very fast but is also very limited. You cannot add a suffix to the filenames like .csv.

split -l11000 products.csv file_

有没有一种有效的方法可以仅通过bash来完成此任务?单行命令会很棒.

Is there an effective way to do this task in just bash? A one-line command would be great.

推荐答案

此问题的答案是 ,这在AWK中是可能的.

The answer to this question is yes, this is possible with AWK.

这个想法是要牢记标题并以filename.00001.csv形式的文件名打印其余所有内容:

The idea is to keep the header in mind and print all the rest in filenames of the form filename.00001.csv:

awk -v l=11000 '(NR==1){header=$0;next}
                (NR%l==2) {
                   close(file); 
                   file=sprintf("%s.%0.5d.csv",FILENAME,++c)
                   sub(/csv[.]/,"",file)
                   print header > file
                }
                {print > file}' file.csv

这可以通过以下方式进行:

This works in the following way:

  • (NR==1){header=$0;next}::如果记录/行是第一行,则将该行另存为 header .
  • (NR%l==2){...}::每次我们写入l=11000记录/行时,我们都需要开始写入新文件.每当记录/行号的模数达到2时,就会发生这种情况.这是在 2、2 + l,2 + 2l,2 + 3l ,...行上.发现我们做到了:
    • close(file)::也关闭刚刚写入的文件.
    • file=sprintf("%s.%0.5d.csv",FILENAME,++c); sub(/csv[.]/,"",file):将新文件名定义为FILENAME.00XXX.csv
    • print header > file::打开文件并将标头写入该文件.
    • (NR==1){header=$0;next}: If the record/line is the first line, save that line as the header.
    • (NR%l==2){...}: Every time we wrote l=11000 records/lines, we need to start writing to a new file. This happens every time the modulo of the record/line number hits 2. This is on the lines 2, 2+l, 2+2l, 2+3l,.... When such a line is found we do:
      • close(file): close the file you just wrote too.
      • file=sprintf("%s.%0.5d.csv",FILENAME,++c); sub(/csv[.]/,"",file): define the new filename as FILENAME.00XXX.csv
      • print header > file: open the file and write the header to that file.

      注意::如果您不关心文件名,则可以使用以下较短的版本:

      note: If you don't care about the filename, you can use the following shorter version:

      awk -v m=100 '
          (NR==1){h=$0;next}
          (NR%m==2) { close(f); f=sprintf("%s.%0.5d",FILENAME,++c); print h > f }
          {print > f}' file.csv
      

      这篇关于将CSV文件拆分为较小的文件,但保留标题吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆