类别有很多页面(巨大的偏移量)(stackoverflow如何工作?) [英] Category with lot of pages (huge offsets) (how does stackoverflow work?)

查看:113
本文介绍了类别有很多页面(巨大的偏移量)(stackoverflow如何工作?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为我的问题可以通过只知道如何,例如,stackoverflow工作来解决。



例如,此页面加载几毫秒; 300ms):
http://stackoverflow.com/questions?page=61440&sort=newest



我可以考虑的那个页面的唯一查询是 SELECT * FROM stuff ORDER BY date DESC LIMIT {pageNumber} * {stuffPerPage},{pageNumber} * {stuffPerPage} + {stuffPerPage}



这样的查询可能需要几秒钟才能运行,堆栈溢出页面几乎没有时间加载。它不能是一个缓存的查询,因为这个问题是随时间过帐的,并且每次发布一个问题时重建缓存只是疯狂。



那么,在您的意见中工作?



(为了使问题更容易,让我们忘记ORDER BY)
示例(表完全缓存在ram中并存储ssd驱动器)

  mysql>选择*从线程限制1000000,1; 
set in set(1.61 sec)

mysql> select * from thread limit 10000000,1;
set in set(16.75 sec)

mysql>描述select * from thread limit 1000000,1;
+ ---- + ------------- + -------- + ------ + ---------- ----- + ------ + --------- + ------ + ---------- + ------- +
| id | select_type |表|类型| possible_keys |键| key_len | ref |行|额外|
+ ---- + ------------- + -------- + ------ + ---------- ----- + ------ + --------- + ------ + ---------- + ------- +
| 1 | SIMPLE |线程| ALL | NULL | NULL | NULL | NULL | 64801163 | |
+ ---- + ------------- + -------- + ------ + ---------- ----- + ------ + --------- + ------ + ---------- + ------- +

mysql> select * from thread ORDER BY thread_date DESC limit 1000000,1;
集合中的1行(1分钟37.56秒)


mysql> SHOW INDEXES FROM thread;
+ -------- + ------------ + ---------- + ------------ - + -------------- + ----------- + ------------- + ------ ---- + -------- + ------ + ------------ + --------- + ------ --------- +
|表|非统一| Key_name | Seq_in_index | Column_name |整理|基数| Sub_part |包装|空| Index_type |评论| Index_comment |
+ -------- + ------------ + ---------- + ------------ - + -------------- + ----------- + ------------- + ------ ---- + -------- + ------ + ------------ + --------- + ------ --------- +
|线程| 0 | PRIMARY | 1 | newsgroup_id | A | 102924 | NULL | NULL | | BTREE | | |
|线程| 0 | PRIMARY | 2 | thread_id | A | 47036298 | NULL | NULL | | BTREE | | |
|线程| 0 | PRIMARY | 3 |帐户| A | 47036298 | NULL | NULL | | BTREE | | |
|线程| 0 | PRIMARY | 4 | thread_date | A | 47036298 | NULL | NULL | | BTREE | | |
|线程| 1 |日期| 1 | thread_date | A | 47036298 | NULL | NULL | | BTREE | | |
+ -------- + ------------ + ---------- + ------------ - + -------------- + ----------- + ------------- + ------ ---- + -------- + ------ + ------------ + --------- + ------ --------- +
集合中的5行(0.00秒)


解决方案

在日期列上创建BTREE索引,查询将以微风运行。

  CREATE INDEX date ON stuff(date)使用BTREE 






更新:这是我刚刚做的测试:

  CREATE TABLE test(d DATE,i INT,INDEX(d)); 

用不同的 i s和 d s

  mysql> SELECT * FROM test LIMIT 1000000,1; 
+ ------------ + --------- +
| d | i |
+ ------------ + --------- +
| 1897-07-22 | 1000000 |
+ ------------ + --------- +
集合中的一行(0.66秒)

mysql> SELECT * FROM test ORDER BY d LIMIT 1000000,1;
+ ------------ + -------- +
| d | i |
+ ------------ + -------- +
| 1897-07-22 | 999980 |
+ ------------ + -------- +
1行(1.68秒)

这里是一个内含的观察:

  mysql> ; EXPLAIN SELECT * FROM test ORDER BY d LIMIT 1000,1; 
+ ---- + ------------- + ------- + ------- + ---------- ----- + ------ + --------- + ------ + ------ + ------- +
| id | select_type |表|类型| possible_keys |键| key_len | ref |行|额外|
+ ---- + ------------- + ------- + ------- + ---------- ----- + ------ + --------- + ------ + ------ + ------- +
| 1 | SIMPLE |测试|索引| NULL | d | 4 | NULL | 1001 | |
+ ---- + ------------- + ------- + ------- + ---------- ----- + ------ + --------- + ------ + ------ + ------- +

mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 10000,1;
+ ---- + ------------- + ------- + ------ + ----------- ---- + ------ + --------- + ------ + --------- + ----------- ----- +
| id | select_type |表|类型| possible_keys |键| key_len | ref |行|额外|
+ ---- + ------------- + ------- + ------ + ----------- ---- + ------ + --------- + ------ + --------- + ----------- ----- +
| 1 | SIMPLE |测试| ALL | NULL | NULL | NULL | NULL | 2000343 |使用filesort |
+ ---- + ------------- + ------- + ------ + ----------- ---- + ------ + --------- + ------ + --------- + ----------- ----- +

MySql使用OFFSET 1000的索引,更有趣的是,如果我 FORCE INDEX 查询需要更多时间:

  mysql> SELECT * FROM test FORCE INDEX(d)ORDER BY d LIMIT 1000000,1; 
+ ------------ + -------- +
| d | i |
+ ------------ + -------- +
| 1897-07-22 | 999980 |
+ ------------ + -------- +
集合中的一行(2.21秒)


I think that my question can be solved by just knowing how, for example, stackoverflow works.

For example, this page, loads in a few ms (< 300ms): http://stackoverflow.com/questions?page=61440&sort=newest

The only query i can think about for that page is something like SELECT * FROM stuff ORDER BY date DESC LIMIT {pageNumber}*{stuffPerPage}, {pageNumber}*{stuffPerPage}+{stuffPerPage}

A query like that might take several seconds to run, but the stack overflow page loads almost in no time. It can't be a cached query, since that question are posted over time and rebuild the cache every time a question is posted is simply madness.

So, how do this works in your opinion?

(to make the question easier, let's forget about the ORDER BY) Example (the table is fully cached in ram and stored in an ssd drive)

mysql> select * from thread limit 1000000, 1;
1 row in set (1.61 sec)

mysql> select * from thread limit 10000000, 1;
1 row in set (16.75 sec)

mysql> describe select * from thread limit 1000000, 1;
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows     | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
|  1 | SIMPLE      | thread | ALL  | NULL          | NULL | NULL    | NULL | 64801163 |       |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+

mysql> select * from thread ORDER BY thread_date DESC limit 1000000, 1;
1 row in set (1 min 37.56 sec)


mysql> SHOW INDEXES FROM thread;
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| thread |          0 | PRIMARY  |            1 | newsgroup_id | A         |      102924 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            2 | thread_id    | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            3 | postcount    | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            4 | thread_date  | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          1 | date     |            1 | thread_date  | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)

解决方案

Create a BTREE index on date column and the query will run in a breeze.

CREATE INDEX date ON stuff(date) USING BTREE


UPDATE: Here is a test I just did:

CREATE TABLE test( d DATE, i INT, INDEX(d) );

Filled the table with 2,000,000 rows with different unique is and ds

mysql> SELECT * FROM test LIMIT 1000000, 1;
+------------+---------+
| d          | i       |
+------------+---------+
| 1897-07-22 | 1000000 |
+------------+---------+
1 row in set (0.66 sec)

mysql> SELECT * FROM test ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d          | i      |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (1.68 sec)

And here is an interesiting observation:

mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 1000, 1;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | test  | index | NULL          | d    | 4       | NULL | 1001 |       |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+

mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 10000, 1;
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
|  1 | SIMPLE      | test  | ALL  | NULL          | NULL | NULL    | NULL | 2000343 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+

MySql does use the index for OFFSET 1000 but not for 10000.

Even more interesting, if I do FORCE INDEX query takes more time:

mysql> SELECT * FROM test FORCE INDEX(d) ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d          | i      |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (2.21 sec)

这篇关于类别有很多页面(巨大的偏移量)(stackoverflow如何工作?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆