使用WHERE子句或使用应用程序代码过滤结果集是否更好? [英] Is it better to filter a resultset using a WHERE clause or using application code?

查看:62
本文介绍了使用WHERE子句或使用应用程序代码过滤结果集是否更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,这是问题的简单抽象:

OK, here is a simple abstraction of the problem:

2个变量(male_users和female_users)存储两组用户,即男性和女性

2 variables(male_users and female_users) to store 2 groups of user i.e. male and female

  1. 一种方法是使用两个查询来选择它们:

select * from users where gender = 'male',然后将结果存储在male_users中

select * from users where gender = 'male' and then store the result in male_users

select * from users where gender = 'female',然后将结果存储在female_users

select * from users where gender = 'female' and then store the result in female_users

  1. 另一种方法是只运行一个查询:

'select * from users',然后遍历结果集以过滤程序中的男性用户 php代码段是这样的:

'select * from users' and then loop over the result set to filter the male users in the program php code snippet would be sth like this:

$result = mysql_query('select * from users');

while (($row=mysql_fetch_assoc(result)) != null) {
  if ($row['gender'] == 'male'){// add to male_users}
  else if ($row['gender'] == 'female'){// add to female_users}
}

哪个更有效并且被认为是更好的方法?

which one is more efficient and considered as a better approach?

这只是问题的简单说明.实际的项目中可能会有更大的表格供查询,还有更多的过滤器选项.

this is just a simple illustration of the problem. the real project may have lager tables to query and more filter options.

提前谢谢!

推荐答案

任何应用程序的经验法则是让DB做好它所做的出色工作:过滤,排序和联接.

The rule of thumb for any application is to let the DB do the things it does well: filtering, sorting, and joining.

将查询分为各自的函数或类方法:

Separate the queries into their own functions or class methods:

$men = $foo->fetchMaleUsers();
$women = $foo->fetchFemaleUsers();

更新

我以史蒂文(Steven)的PostgreSQL演示为例,它执行的全表扫描查询的性能是两个单独的索引查询的两倍,并使用MySQL(在实际问题中使用了它)进行了模仿:

架构

Update

I took Steven's PostgreSQL demonstration of a full table scan query performing twice as good as two separate indexed queries and mimicked it using MySQL (which is used in the actual question):

CREATE TABLE `gender_test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `gender` enum('male','female') NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=26017396 DEFAULT CHARSET=utf8

我将性别类型更改为不是VARCHAR(20),因为对于本专栏文章而言,它更现实,我还提供了您希望在表上使用的主键,而不是任意的DOUBLE值.

I changed the gender type to not be a VARCHAR(20) as it is more realistic for the purpose of this column, I also provide a primary key as you would expect on a table instead of an arbitrary DOUBLE value.

mysql> select sql_no_cache * from gender_test WHERE gender = 'male';

12995993 rows in set (31.72 sec)

mysql> select sql_no_cache * from gender_test WHERE gender = 'female';

13004007 rows in set (31.52 sec)

mysql> select sql_no_cache * from gender_test;

26000000 rows in set (32.95 sec)

我相信这不需要解释.

ALTER TABLE gender_test ADD INDEX (gender);

...

mysql> select sql_no_cache * from gender_test WHERE gender = 'male';

12995993 rows in set (15.97 sec)

mysql> select sql_no_cache * from gender_test WHERE gender = 'female';

13004007 rows in set (15.65 sec)

mysql> select sql_no_cache * from gender_test;

26000000 rows in set (27.80 sec)

此处显示的结果与Steven的数据根本不同.索引查询执行几乎的速度是全表扫描速度的两倍.这来自使用常识列定义的正确索引的表.我一点都不了解PostgreSQL,但是在Steven的示例中必须有一些严重的配置错误,以免显示相似的结果.

The results shown here are radically different from Steven's data. The indexed queries perform almost twice as fast as the full table scan. This is from a properly indexed table using common sense column definitions. I don't know PostgreSQL at all, but there must be some significant misconfiguration in Steven's example to not show similar results.

鉴于PostgreSQL在做事上优于MySQL的声誉,或者至少与我一样好,我敢说如果正确使用PostgreSql也会表现出类似的性能.

Given PostgreSQL's reputation for doing things better than MySQL, or at least as good as, I daresay that PostgreSql would demonstrate similar performance if properly used.

还请注意,在同一台计算机上,过于简化的for循环执行5200万次比较需要花费 7.3秒.

Also note, on this same machine an overly simplified for loop doing 52 million comparisons takes an additional 7.3 seconds to execute.

<?php
$N = 52000000;
for($i = 0; $i < $N; $i++) {
    if (true == true) {
    }
}

我认为,鉴于这些数据,哪种方法更好是显而易见的.

I think it's rather obvious what is the better approach given this data.

这篇关于使用WHERE子句或使用应用程序代码过滤结果集是否更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆