需要在perl中转置一个LARGE csv文件 [英] Need to transpose a LARGE csv file in perl

查看:309
本文介绍了需要在perl中转置一个LARGE csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

csv数据文件总共为3.2 GB,上帝知道有多少行和列(假设非常大)。该文件是具有个体群体的SNP数据的基因组数据。因此,csv文件包含诸如 TD102230 和遗传数据例如 A / A A / T



现在我使用了 Text :: CSV Array :: Transpose 模块,但似乎不能得到它的权利(如在计算集群冻结)。有什么具体的模块会这样做吗?我对Perl很陌生(在低级编程方面没有多少经验,大多使用R和MATLAB之前的版本)这样详细的解释特别欢迎!

解决方案

p>作为直接的答案,你应该逐行读取文件,使用 Text :: CSV 处理它们,将新值推送到数组,每个数组对应于原始列,然后使用 join 或类似方式输出它们,以获得原始的转置表示。

连接之后处理每个数组也会帮助解决内存问题。并将它们与操作系统设备连接是另一种解决内存需求的方法。



您还应该考虑为什么需要这个。是否真的没有更好的方法来解决手头的实际任务,因为转置本身不会带来真正的目的?


The csv data file is 3.2 GB in total, with god knows how many rows and columns (assume very large). The file is a genomics data with SNP data for a population of individuals. Thus the csv file contains IDs such as TD102230 and genetic data such as A/A and A/T.

Now that I used Text::CSV and Array::Transpose modules but couldn't seem to get it right (as in the computing cluster froze). Is there specific module that would do this? I am new to Perl (not much experience in low level programming, mostly used R and MATLAB before) so detailed explanations especially welcome!

解决方案

As direct answer, you should read file line by line, process them with Text::CSV, push new values to arrays with each array corresponds to original column and then just output them with join or like to get transposed representation of original. Disposing of each array right after join will help with memory problem too.

Writing values to external files instead of array and joining them with OS facilities is another way around memory requirements.

You also should think about why you need this. Is there really no better way to solve real task at hand, since transposing just by itself serves no real purpose?

这篇关于需要在perl中转置一个LARGE csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆