使用MySQL和PHP有效处理大量数据 [英] Processing a large amount of data efficiently with MySQL and PHP

查看:75
本文介绍了使用MySQL和PHP有效处理大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到一种方法来有效地处理PHP/MySQL中的大量数据.情况如下:

I need to find a way to efficiently process a large amount of data in PHP/MySQL. Here's the situation:

我有一个包含一百万条记录的数据库表.根据PHP的用户输入,我需要根据非平凡的计算步骤对所有这100万条记录进行排名,以便选择得分最高的项目.我的问题是,如果我将数据重新排列为列并使用array_multisort,则从内存使用的角度来看,这伸缩性非常差,尤其是在排序步骤中.

I have a database table with, say, one million records. Based on user input from PHP, I need to rank all of those one million records according to a non-trivial calculation step, so that I can pick the top scoring items. My problem is that this scales very poorly from a memory usage standpoint, particularly at the sorting step, if I rearrange the data into columns and use array_multisort.

我能想到的替代方法是:

Alternative methods I can think of are:

  • 在PHP中进行计算,然后将带有分数的数据重新插入到临时表中,使用SELECT ... ORDER BY score ... LIMIT查询检索得分最高的项目
  • 在PHP中进行计算,并将带有分数的数据输出到CSV文件中,然后调用命令行排序实用程序,然后读取前X行数
  • 如选项1所述,使用存储过程在MySQL中进行计算并检索到前X个项目.我对此的担心是,数据库是否非常适合此涉及的数字运算
  • Doing the calculations in PHP and reinserting the data with the scores into a temporary table, retrieving the highest scoring items using a SELECT ... ORDER BY score ... LIMIT query
  • Doing the calculations in PHP and outputting the data with the scores into a CSV file, then calling the command-line sort utility, then reading in the top X number of lines
  • Doing the calculations in MySQL using a stored procedure and retrieving the top X number of items, as in option 1. My concern with this is whether the DB is well suited for the number crunching this would involve

对于搜索引擎来说,这已经是一个相当普遍的问题.可伸缩性是第一要务,但性能也必须相当不错.这些方法中最好的一种,还是我什至没有考虑的其他出色选择?

This has got to be a fairly common problem for things like search engines. Scalability is the number one priority, but performance has to be pretty good too. Is one of these approaches best, or is there some other great option that I'm not even considering?

推荐答案

假定您的数据集太大而无法存储在内存中.如果只需要前n个项,则只能将前结果保留在内存中,如下所示:您可以浏览一百万行.这也可以与您的临时表想法一起使用,编写每个批次的顶部记录.

Assuming your dataset is too large to store in memory.... If you only need the top n items, you can keep only the top results in memory as you page through the 1 million rows. This would also work with the temporary table idea of yours, writing the top records from each batch.

另一种选择是编写用户定义的函数:

Another option would be to write a user defined function:

http://dev.mysql.com/doc/refman/5.1/en/adding-functions.html

这篇关于使用MySQL和PHP有效处理大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆