需要在perl中转置一个LARGE csv文件 [英] Need to transpose a LARGE csv file in perl
问题描述
csv数据文件总共为3.2 GB,上帝知道有多少行和列(假设非常大)。该文件是具有个体群体的SNP数据的基因组数据。因此,csv文件包含诸如 TD102230
和遗传数据例如 A / A
和 A / T
。
现在我使用了 Text :: CSV
和 Array :: Transpose
模块,但似乎不能得到它的权利(如在计算集群冻结)。有什么具体的模块会这样做吗?我对Perl很陌生(在低级编程方面没有多少经验,大多使用R和MATLAB之前的版本)这样详细的解释特别欢迎!
p>作为直接的答案,你应该逐行读取文件,使用 Text :: CSV
处理它们,将新值推送到数组,每个数组对应于原始列,然后使用 join
或类似方式输出它们,以获得原始的转置表示。
在连接之后处理每个数组也会帮助解决内存问题。并将它们与操作系统设备连接是另一种解决内存需求的方法。
您还应该考虑为什么需要这个。是否真的没有更好的方法来解决手头的实际任务,因为转置本身不会带来真正的目的?
The csv data file is 3.2 GB in total, with god knows how many rows and columns (assume very large). The file is a genomics data with SNP data for a population of individuals. Thus the csv file contains IDs such as TD102230
and genetic data such as A/A
and A/T
.
Now that I used Text::CSV
and Array::Transpose
modules but couldn't seem to get it right (as in the computing cluster froze). Is there specific module that would do this? I am new to Perl (not much experience in low level programming, mostly used R and MATLAB before) so detailed explanations especially welcome!
As direct answer, you should read file line by line, process them with Text::CSV
, push new values to arrays with each array corresponds to original column and then just output them with join
or like to get transposed representation of original. Disposing of each array right after join
will help with memory problem too.
Writing values to external files instead of array and joining them with OS facilities is another way around memory requirements.
You also should think about why you need this. Is there really no better way to solve real task at hand, since transposing just by itself serves no real purpose?
这篇关于需要在perl中转置一个LARGE csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!