排名与数百万条目 [英] Ranking with millions of entries

查看:117
本文介绍了排名与数百万条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个服务器上工作的在线游戏应该能够处理数百万玩家。现在游戏需要排行榜,并希望能够显示玩家当前位置,以及可能的当前玩家位置附近的其他玩家以及玩家朋友的位置。



现在我在MySQL中做了这件事,我知道它在技术上是可能的,但我认为,因为这是一个常见的做法,对于很多在线游戏必须有现有的库或数据库,特别是为此目的?



任何人都可以建议我什么数据库是最适合这些类型的查询和可能已经做了很多这项工作的任何预先存在的库?具有API访问权限的第三方服务也很好。



希望得到一些好的建议,谢谢!



编辑:



要澄清,我需要一个数据库,可以容纳数百万条目(到目前为止MySQL是好的),我可以很容易地得到排名结果。例如,如果我从leaderboard表中得到一个特定的行,我需要知道该行的排名。此查询必须在500毫秒以下,而不管数据库的大小。



或者,更新表与当前排名信息的方式将很好,因为此更新查询不锁定整个表,并且更新查询在30秒内运行。



有关使用什么数据库/机制或第三方服务的任何想法将非常感谢!

解决方案

单个磁盘寻道大约15ms,小于500ms的响应时间限制了大约30个随机磁盘访问。这不是很多。



在我的小型笔记本电脑上,我有一个开发数据库

  root @ localhost [kris]>选择@@ innodb_buffer_pool_size / 1024/1024作为pool_mb; 
+ -------------- +
| pool_mb |
+ -------------- +
| 128.00000000 |
+ -------------- +
集合中的1行(0.00秒)

和一个缓慢的笔记本电脑。我用

创建了一个分数表。

  root @ localhost [kris] show create table score\G 
*************************** 1. row ********* ******************
表:score
创建表:CREATE TABLE`score`(
`player_id` int(10)unsigned NOT NULL AUTO_INCREMENT,
`score` int(11)NOT NULL,
PRIMARY KEY(`player_id`),
KEY`score($ score $)
)ENGINE = InnoDB AUTO_INCREMENT = 2490316 DEFAULT CHARSET = latin1
集合中的1行(0.00秒)

随机整数分数和顺序player_id值。我们有

  root @ localhost [kris]> select count(*)/ 1000/1000 as mrows from score\G 
*************************** 1. row ***************************
mrows:2.09715200
集合中的1行(0.39秒)

数据库维护(score,player_id) code> score 在索引 score 中的顺序,因为InnoDB索引中的数据存储在BTREE中,指针)是主键值,因此定义 KEY(score)最终为 KEY(score,player_id)内部。我们可以通过查看分数检索的查询计划来证明:

  root @ localhost [kris]解释select * from score其中score = 17 \G 
*************************** 1.行**** ***********************
id:1
select_type:SIMPLE
表:score
类型: ref
possible_keys:score
key:score
key_len:4
ref:const
rows:29
Extra:使用索引
1行in set(0.00 sec)

如您所见, 正在使用使用索引,表示不需要数据访问。



对于给定常数 player_id 的排名查询在我的笔记本电脑上正好为500ms:

  root @ localhost [kris]>从分数中选择p。*,count(*)作为rank 
作为p连接分数, s.score
其中p.player_id = 479269\G
*************************** 1.行***************************
player_id:479269
得分:99901
排名:2074
1行(0.50秒)

随着更多的内存和更快的盒子,更快,但是它仍然是一个比较昂贵的操作,因为计划吸引:

  root @ localhost [kris]解释选择p。*,count(*)作为从分数作为p连接分数的排名,作为p。 s.score其中p.player_id = 479269; 
+ ---- + ------------- + ------- + ------- + ---------- ----- + --------- + --------- + ------- + --------- + ------ -------------------- +
| id | select_type |表|类型| possible_keys |键| key_len | ref |行|额外|
+ ---- + ------------- + ------- + ------- + ---------- ----- + --------- + --------- + ------- + --------- + ------ -------------------- +
| 1 | SIMPLE | p | const | PRIMARY,score | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | s |索引|得分|得分| 4 | NULL | 2097979 |使用where;使用索引|
+ ---- + ------------- + ------- + ------- + ---------- ----- + --------- + --------- + ------- + --------- + ------ -------------------- +
集合中的2行(0.00秒)

正如你所看到的,计划中的第二个表是一个索引扫描,因此查询会随着玩家数量线性减慢。



如果你想要一个完整的排行榜,你需要离开where子句,然后你得到两个扫描和二次执行时间。



这里是程序的时间:

  root @ localhost [kris]> set @count = 0; 
select *,@count:= @count + 1作为从分数的排序,其中score> = 99901按分数desc排序;
...
| 2353218 | 99901 | 2075 |
| 2279992 | 99901 | 2076 |
| 2264334 | 99901 | 2077 |
| 2239927 | 99901 | 2078 |
| 2158161 | 99901 | 2079 |
| 2076159 | 99901 | 2080 |
| 2027538 | 99901 | 2081 |
| 1908971 | 99901 | 2082 |
| 1887127 | 99901 | 2083 |
| 1848119 | 99901 | 2084 |
| 1692727 | 99901 | 2085 |
| 1658223 | 99901 | 2086 |
| 1581427 | 99901 | 2087 |
| 1469315 | 99901 | 2088 |
| 1466122 | 99901 | 2089 |
| 1387171 | 99901 | 2090 |
| 1286378 | 99901 | 2091 |
| 666050 | 99901 | 2092
| 633419 | 99901 | 2093 |
| 479269 | 99901 | 2094 |
| 329168 | 99901 | 2095 |
| 299189 | 99901 | 2096 |
| 290436 | 99901 | 2097 |
...

因为这是一个程序计划, >


  • 您不能使用LIMIT,因为这将抵消计数器。您必须下载所有这些数据。

  • 你不能真正排序。这个 ORDER BY 子句工作,因为它不排序,而是使用索引。 c />使用filesort 时,计数器值将会一直关闭。



这是最接近于什么是NoSQL(读:过程)数据库将作为一个执行计划的解决方案。



我们可以稳定NoSQL里面子查询,然后分离出我们感兴趣的部分,虽然:

  root @ localhost [kris] set @count = 0; 
select * from(
select *,@count:= @count + 1 as rank
from score
where score> = 99901
order by score desc
)as t
where player_id = 479269;
查询OK,0行受影响(0.00秒)
+ ----------- + ------- + ------ +
| player_id |得分| rank |
+ ----------- + ------- + ------ +
| 479269 | 99901 | 2094 |
+ ----------- + ------- + ------ +
集合中的1行(0.00秒)

root @ localhost [kris]> set @count = 0;
select * from(
select *,@count:= @count + 1 as rank
from score
where score> = 99901
order by score desc
)as t
其中排名在2090和2100之间;
查询OK,0行受影响(0.00秒)
+ ----------- + ------- + ------ +
| player_id |得分| rank |
+ ----------- + ------- + ------ +
| 1387171 | 99901 | 2090 |
| 1286378 | 99901 | 2091 |
| 666050 | 99901 | 2092 |
| 633419 | 99901 | 2093 |
| 479269 | 99901 | 2094 |
| 329168 | 99901 | 2095 |
| 299189 | 99901 | 2096 |
| 290436 | 99901 | 2097 |
+ ----------- + ------- + ------ +
集合中的8行(0.01秒)

子查询将实现前一个结果集为一个名为t的专用表,然后我们可以在外部查询中访问。因为它是一个ad-hoc表,在MySQL中它将没有索引。这限制了在外部查询中可能有效的可能性。



请注意这两个查询如何满足您的时序约束。这是计划:

  root @ localhost [kris] set @count = 0;解释select * from(select *,@count:= @count + 1 as score from score where score> = 99901 order by score desc)as t where rank 2090和2100\G 
查询OK,0受影响的行(0.00秒)

*************************** 1.行****** *********************
id:1
select_type:PRIMARY
table:< derived2>
type:ALL
possible_keys:NULL
key:NULL
key_len:NULL
ref:NULL
rows:2097
Extra:
*************************** 2.行**************** ***********
id:2
select_type:DERIVED
table:score
type:range
possible_keys:score
key:score
key_len:4
ref:NULL
rows:3750
Extra:Using where;使用索引
集合中的2行(0.00秒)

DERIVED 查询和外部 BETWEEN 约束)对于排名不佳的玩家会变慢,然后严重违反您的时间约束。

  root @ localhost [kris] set @count = 0; select * from(select *,@count:= @count + 1 as score from rank from score where score> = 0 order by score desc)as t; 
...
2097152集合中的行(3.56秒)

描述性方法的时间是稳定的(仅取决于表大小):

  root @ localhost [kris]从分数中选择p。*,count(*)作为rank 
作为p连接分数, s.score
其中p.player_id = 1134026;
+ ------- + ------- + --------- +
| player_id |得分| rank |
+ ----------- + ------- + --------- +
| 1134026 | 0 | 2097135 |
+ ----------- + ------- + --------- +
集合中的一行(0.53秒)

您的电话。


I'm working on a server for an online game which should be able to handle millions of players. Now the game needs leaderboards and wants to be able to show a players current position and possibly other players near the current players position as well as the positions of the players friends.

Now I've done this stuff before in MySQL and I know how it's technically possible, however I figured since this is a common practice for a lot of online games there must be existing libraries or databases particularly for this purpose?

Can anyone advice me what database is the best for these types of queries and possibly any pre-existing libraries that already do a lot of this work? A third party service with API access would be fine too.

Hope to get some good advice, thanks!

Edit:

To clarify, I need a database which can hold millions of entries (so far MySQL is good for that) with which I can easily get ranked results. For example if I get a specific row from the "leaderboard" table I need to know which rank that row has. This query has to be under 500ms regardless of the size of the db.

Alternatively a way to update the table with the current ranking information would be fine too long as this update query does not lock the whole table and the update query runs in under 30 seconds.

Any ideas as to what database / mechanism or third party service to use would be much appreciated!

解决方案

A single disk seek is about 15ms, maybe a little less with server grade disks. A response time of less than 500ms limits you to about 30 random disk accesses. That is not a lot.

On my tiny laptop, I have a development database with

root@localhost [kris]> select @@innodb_buffer_pool_size/1024/1024 as pool_mb;
+--------------+
| pool_mb      |
+--------------+
| 128.00000000 |
+--------------+
1 row in set (0.00 sec)

and a slow laptop disk. I created a score table with

root@localhost [kris]> show create table score\G
*************************** 1. row ***************************
       Table: score
Create Table: CREATE TABLE `score` (
  `player_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `score` int(11) NOT NULL,
  PRIMARY KEY (`player_id`),
  KEY `score` (`score`)
) ENGINE=InnoDB AUTO_INCREMENT=2490316 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

with random integer scores and sequential player_id values. We have

root@localhost [kris]> select count(*)/1000/1000 as mrows from score\G
*************************** 1. row ***************************
mrows: 2.09715200
1 row in set (0.39 sec)

The database maintains the pair (score, player_id) in score order in the index score, as data in an InnoDB index is stored in a BTREE, and the row pointer (data pointer) is the primary key value, so that the definition KEY (score) ends up being KEY(score, player_id) internally. We can prove that by looking at the query plan for a score retrieval:

root@localhost [kris]> explain select * from score where score = 17\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: score
         type: ref
possible_keys: score
          key: score
      key_len: 4
          ref: const
         rows: 29
        Extra: Using index
1 row in set (0.00 sec)

As you can see, the key: score is being used with Using index, meaning that no data access is necessary.

The ranking query for a given constant player_id takes precisely 500ms on my laptop:

root@localhost [kris]>  select p.*, count(*) as rank 
    from score as p join score as s on p.score < s.score 
   where p.player_id = 479269\G
*************************** 1. row ***************************
player_id: 479269
    score: 99901
     rank: 2074
1 row in set (0.50 sec)

With more memory and on a faster box it can be quicker, but it is still a comparatively expensive operation, because the plan sucks:

root@localhost [kris]> explain select p.*, count(*) as rank from score as p join score as s on p.score < s.score where p.player_id = 479269;
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows    | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
|  1 | SIMPLE      | p     | const | PRIMARY,score | PRIMARY | 4       | const |       1 |                          |
|  1 | SIMPLE      | s     | index | score         | score   | 4       | NULL  | 2097979 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
2 rows in set (0.00 sec)

As you can see, the second table in the plan is an index scan, so the query slows down linearly with the number of players.

If you want a full leaderboard, you need to leave off the where clause, and then you get two scans and quadratic execution times. So this plan implodes completely.

Time to go procedural here:

root@localhost [kris]> set @count = 0; 
    select *, @count := @count + 1 as rank from score where score >= 99901 order by score desc ;
...
|   2353218 | 99901 | 2075 |
|   2279992 | 99901 | 2076 |
|   2264334 | 99901 | 2077 |
|   2239927 | 99901 | 2078 |
|   2158161 | 99901 | 2079 |
|   2076159 | 99901 | 2080 |
|   2027538 | 99901 | 2081 |
|   1908971 | 99901 | 2082 |
|   1887127 | 99901 | 2083 |
|   1848119 | 99901 | 2084 |
|   1692727 | 99901 | 2085 |
|   1658223 | 99901 | 2086 |
|   1581427 | 99901 | 2087 |
|   1469315 | 99901 | 2088 |
|   1466122 | 99901 | 2089 |
|   1387171 | 99901 | 2090 |
|   1286378 | 99901 | 2091 |
|    666050 | 99901 | 2092 |
|    633419 | 99901 | 2093 |
|    479269 | 99901 | 2094 |
|    329168 | 99901 | 2095 |
|    299189 | 99901 | 2096 |
|    290436 | 99901 | 2097 |
...

Because this is a procedural plan, it is unstable:

  • You cannot use LIMIT, because that will offset the counter. Instead you have to download all this data.
  • You cannot really sort. This ORDER BY clause works, because it does not sort, but uses an index. As soon as you see using filesort, the counter values will be wildly off.

It is the solution that comes closest to what a NoSQL (read: procedural) database will do as an execution plan, though.

We can stabilize the NoSQL inside a subquery and then slice out the part that is of interest to us, though:

root@localhost [kris]> set @count = 0; 
    select * from ( 
        select *, @count := @count + 1 as rank 
          from score 
         where score >= 99901 
      order by score desc 
    ) as t 
    where player_id = 479269;
Query OK, 0 rows affected (0.00 sec)
+-----------+-------+------+
| player_id | score | rank |
+-----------+-------+------+
|    479269 | 99901 | 2094 |
+-----------+-------+------+
1 row in set (0.00 sec)

root@localhost [kris]> set @count = 0; 
    select * from ( 
        select *, @count := @count + 1 as rank 
          from score 
         where score >= 99901 
      order by score desc 
    ) as t 
    where rank between 2090 and 2100;
Query OK, 0 rows affected (0.00 sec)
+-----------+-------+------+
| player_id | score | rank |
+-----------+-------+------+
|   1387171 | 99901 | 2090 |
|   1286378 | 99901 | 2091 |
|    666050 | 99901 | 2092 |
|    633419 | 99901 | 2093 |
|    479269 | 99901 | 2094 |
|    329168 | 99901 | 2095 |
|    299189 | 99901 | 2096 |
|    290436 | 99901 | 2097 |
+-----------+-------+------+
8 rows in set (0.01 sec)

The subquery will materialize the former result set as an ad-hoc table named t, which we then can access in the outer query. Because it is an ad-hoc table, in MySQL it will have no index. This limits what is possible efficiently in the outer query.

Note how both queries satisfy your timing constraint, though. Here is the plan:

root@localhost [kris]> set @count = 0; explain select * from ( select *, @count := @count + 1 as rank from score where score >= 99901 order by score desc ) as t where rank between 2090 and 2100\G
Query OK, 0 rows affected (0.00 sec)

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2097
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: score
         type: range
possible_keys: score
          key: score
      key_len: 4
          ref: NULL
         rows: 3750
        Extra: Using where; Using index
2 rows in set (0.00 sec)

Both query components (the inner, DERIVED query and the outer BETWEEN constraint) will get slower for badly ranked players, though, and then grossly violate your timing constraints.

root@localhost [kris]> set @count = 0; select * from ( select *, @count := @count + 1 as rank from score where score >= 0 order by score desc ) as t;
...
2097152 rows in set (3.56 sec)

The execution time for the descriptive approach is stable (dependent only on table size):

root@localhost [kris]> select p.*, count(*) as rank 
   from score as p join score as s on p.score < s.score 
   where p.player_id = 1134026;
+-----------+-------+---------+
| player_id | score | rank    |
+-----------+-------+---------+
|   1134026 |     0 | 2097135 |
+-----------+-------+---------+
1 row in set (0.53 sec)

Your call.

这篇关于排名与数百万条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆