Perl:编程效率,用于计算大量数据的相关系数 [英] Perl: Programming Efficiency when computing correlation coefficients for a large set of data

查看:1052
本文介绍了Perl:编程效率,用于计算大量数据的相关系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:链接应该可以正常工作,对不起麻烦$
我有一个如下所示的文本文件:

Link should work now, sorry for the trouble

I have a text file that looks like this:


Name, Test 1, Test 2, Test 3, Test 4, Test 5
Bob, 86, 83, 86, 80, 23
Alice, 38, 90, 100, 53, 32
Jill, 49, 53, 63, 43, 23.

我正在写一个给定这个文本文件,它将生成一个Pearson相关系数表,看起来像这样,条目(x,y)是人x和人y之间的相关:

I am writing a program that given this text file, it will generate a Pearson's correlation coefficient table that looks like this where the entry (x,y) is the correlation between person x and person y:


Name,Bob,Alice,Jill
Bob, 1, 0.567088412588577, 0.899798494392584
Alice, 0.567088412588577, 1, 0.812425393004088
Jill, 0.899798494392584, 0.812425393004088, 1

我的程序工作,除了我喂的数据集有82列,更重要的是54000行。当我现在运行我的程序,这是非常缓慢,我得到一个内存不足的错误。有没有办法我可以首先删除任何内存不足错误的可能性,也许使程序运行更有效率?代码在这里:代码

My program works, except that the data set I am feeding it has 82 columns and, more importantly, 54000 rows. When I run my program right now, it is incredibly slow and I get an out of memory error. Is there a way I can first of all, remove any possibility of an out of memory error and maybe make the program run a little more efficiently? The code is here: code.

感谢您的帮助,
杰克

Thanks for your help,
Jack

编辑:如果有人试图做大规模的计算,转换你的数据转换成hdf5格式。这是我最后做的,以解决这个问题。

In case anyone else is trying to do large scale computation, convert your data into hdf5 format. This is what I ended up doing to solve this issue.

推荐答案

你将要做至少54000 ^ 2 82计算和比较。当然,这将需要很多时间。你在记忆中藏着一切吗?那也会很大。它会更慢,但如果您可以将用户保留在数据库中并计算一个用户对其他所有用户,则可能会使用较少的内存,然后继续执行,并对所有其他用户执行此操作,而不是一个大规模数组或散列。

You're going to have to do at least 54000^2*82 calculations and comparisons. Of course it's going to take a lot of time. Are you holding everything in memory? That's going to be pretty large too. It will be slower, but it might use less memory if you can keep the users in a database and calculate one user against all the others, then go on to the next and do it against all the others instead of one massive array or hash.

这篇关于Perl:编程效率,用于计算大量数据的相关系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆