分割具有800,000列的文件 [英] Split file with 800,000 columns

查看：52 发布时间：2020/9/15 6:21:54 bash unix awk cut

本文介绍了分割具有800,000列的文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将一个包含800,000列和40,000行的基因组数据文件拆分为一系列文件，每个文件包含100列，总大小为118GB.

I want to split a file of genomic data with 800,000 columns and 40,000 rows into a series of files with 100 columns each, total size 118GB.

我当前正在运行以下bash脚本(多线程15次):

I am currently running the following bash script, multithread 15 times:

infile="$1"
start=$2
end=$3
step=$(($4-1))

for((curr=$start, start=$start, end=$end; curr+step <= end; curr+=step+1)); do
  cut -f$curr-$((curr+step)) "$infile" > "${infile}.$curr" -d' '
done

但是，从脚本的当前进度来看，将需要300天才能完成拆分?！

是否有一种更有效的方法来按列将空格分隔的文件拆分为较小的块?

Is there a more efficient way to column wise split a space-delimited file into smaller chunks?

推荐答案

尝试以下awk脚本:

awk -v cols=100 '{ 
     f = 1 
     for (i = 1; i <= NF; i++) {
       printf "%s%s", $i, (i % cols && i < NF ? OFS : ORS) > (FILENAME "." f)
       f=int(i/cols)+1
     }
  }' largefile

我希望它比问题中的shell脚本要快.

I expect it to be faster than the shell script in the question.

这篇关于分割具有800,000列的文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

分割具有800,000列的文件 [英] Split file with 800,000 columns

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

分割具有800,000列的文件 [英] Split file with 800,000 columns

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭