测验应用中的答案表数据库设计 [英] Database design for answer table in quiz app

查看:138
本文介绍了测验应用中的答案表数据库设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在设计一个测验应用程序,我坚持如何设计答案表。



假设我有以下表格:

 用户(user_id,...其他列)
问题(question_id,user_id,...其他列)
QuestionAnswers(question_id ,answer_id ...其他列)

现在关于UserAnswers表怎么办?我想到的结构是这样的:

  UserAnswers(user_id,question_id,answer_id,..其他列)

我的结构在开始时效果很好,但一旦达到1000万行,性能就会下降。考虑到我的应用程序,如果有10,000个问题,系统中有1000个用户,每个用户回答每个问题。我会很容易地达到1000万行,随着用户和问题的增长,表格大小将会大幅增长。



存储这些答案有什么更好的方法?



此外,我在MySQL中设计了系统。你认为在其他DBMS中,相同的表格结构会更好吗?

  mysql>说明select_id(*)from user_answer where question_id = 9845; 
+ ---- + ------------- + ------------- + ------------ + ------ + --------------- + ------------- + --------- + - ----- + ------ + ---------- + ------------- +
| id | select_type |表|分区|类型| possible_keys |关键| key_len |参考|行|过滤|额外|
+ ---- + ------------- + ------------- + ------------ + ------ + --------------- + ------------- + --------- + - ----- + ------ + ---------- + ------------- +
| 1 | SIMPLE | user_answer | NULL | ref | question_id | question_id | 4 | const | 645 | 100.00 |使用索引|
+ ---- + ------------- + ------------- + ------------ + ------ + --------------- + ------------- + --------- + - ----- + ------ + ---------- + ------------- +
1行集,1警告( 0.00秒)


mysql>从user_answer中解释select count(*);
+ ---- + ------------- + ------- + ------------ + ----- - + --------------- + ------ + --------- + ------ + ------ + - --------- + ------------------------------ +
| id | select_type |表|分区|类型| possible_keys |键| key_len | ref |行|过滤|额外|
+ ---- + ------------- + ------- + ------------ + ----- - + --------------- + ------ + --------- + ------ + ------ + - --------- + ------------------------------ +
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |选择优化的表|
+ ---- + ------------- + ------- + ------------ + ----- - + --------------- + ------ + --------- + ------ + ------ + - --------- + ------------------------------ +
1行集, 1警告(0.00秒)


mysql>从user_answer中选择count(*);
+ ---------- +
| count(*)|
+ ---------- +
| 20042126 |
+ ---------- +
1行集(11分30.33秒)


解决方案

一般的索引概念是一个关键字中最左边的。让我们举个例子来看一下这个关键字(不管主要是不是这里的重点)

  key(a,b, c)

对于诸如

 从myTable中选择区域c ='Turkey'

以上键未使用。



对于诸如

 的查询来自myTable的区域,其中a = 17和c ='Turkey'

密钥用于 most 最左边的部分,这是 a ,因为 b 不在查询中。所以关键是有用的,但并不完全有用。这意味着,至少它可以很快地让您进入分段的 a 行,但执行其中。 >

让我再说一遍:在这个查询中,它没有完全使用索引来获取到 c 。它知道 b 不在查询的组合中,并且魔术跳过 b 在完全使用索引的同时获得 c 。但是至少索引是部分使用的。



这就是为什么在诸如ints和复合物之类的索引宽度上,我经常创建第二个复合索引去另一个方式,如此答案所示的接线表:

 唯一键(studentId,courseId,term), - 不允许组合重复(注意学生可以重新下一个学期)
key(courseId,studentId),

忽略术语在这里进行讨论。要点是,这些是细小的int(相对较低的开销)。第二个关键是需要开销。所以这是一个成本,我愿意付出的代价。但是对于另一个方向的查询,我被覆盖。意思是涉及 courseId 而没有 studentId 的查询。



注意,我的复合翻转在上面不是一个恒星。经常指出,如果显示出来,会导致不必要的开销。特别是,对于第二个键,它应该只在 courseId (非复合)。如果在第一个关键字,无论什么原因,我将术语楔入第二名,那么这将是一个有效的例子。



一个更好的例子是

 键(a,b,c),
键(c,b )

以上除了别的以外,对于仅仅 c ,以及 b c 在一起。但不仅仅是 b



外卖:



抵制将新索引分散到您的模式中的冲动,愚蠢地认为它们将被使用。特别是对于在实际和频繁查询中未被拾取的非最左边的列。当然,对于那些刚刚提及的更广泛的列,如多重索引排序中的varchar(100)次多次翻转。他们所做的所有事情都可能会减慢插入和更新速度,并提供多次在实际查询中的零性能提升。所以仔细检查一下。



所有索引选择都是有代价的。只有你应该为你的系统做出什么决定。


I am designing a quiz app and i am stuck on how to design the answers table.

Assume i have the following tables:

User(user_id,...other columns)
Question(question_id,user_id,...other columns)
QuestionAnswers(question_id,answer_id... other columns)

Now what to do about the UserAnswers table? The structure I have in mind is this:

UserAnswers(user_id,question_id,answer_id,.. other columns)

The structure that I have made works great at start, but the performance starts degrading once I reach 10 million rows. Considering my app, if 10,000 questions are present, and there are 1000 users in the system and each user answers each of the question. I will easily get to 10 million rows, and as users and questions grow the table size will grow dramatically.

What is a better approach to store these answers?

Moreover, I designed the system in MySQL. Do you think the same table structure would work better in some other DBMS?

mysql> explain select count(*) from user_answer where question_id = 9845;
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
| id | select_type | table       | partitions | type | possible_keys | key         | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | user_answer | NULL       | ref  | question_id   | question_id | 4       | const |  645 |   100.00 | Using index |
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)


mysql> explain select count(*) from user_answer;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)


mysql> select count(*) from user_answer;
+----------+
| count(*) |
+----------+
| 20042126 |
+----------+
1 row in set (11 min 30.33 sec)

解决方案

A general indexing concept is what is left-most in a key. Let's take for example the following key (whether or not it is primary is not the focus here)

key(a,b,c)

For queries such as

select region from myTable where c='Turkey'

the above key is not used. You may endure a tablescan.

For queries such as

select region from myTable where a=17 and c='Turkey'

The key is used up to the most left-most part used, which is a, as b is not in the query. So the key is useful, but not entirely useful. Meaning, at least it quickly gets you to the segmented a rows, but performs a where from there.

Let me say the above another way: In that query, it does not fully use the index to get to c. It knows b is not in the mix of the query, and does not magically jump over b to get to c while fully using the index. But at least the index is partially used.

This is why on thin index width such as ints, and with composites, I often create the second composite index "going the other way", as shown in this answer for Junction tables:

unique key(studentId,courseId,term), -- no duplicates allowed for the combo (note student can re-take it next term)
key (courseId,studentId),

ignore term in there for this discussion. Point being, those are thin ints (relatively low overhead). The second key will require overhead. So it comes at a cost, a cost I am willing to pay. But for queries going in the other direction, I am covered. Meaning, queries involving courseId without the studentId.

Note, my composite flip in the above is not a stellar one. It has often been pointed out to me that having it as shown causes unnecessary overhead. In particular, for the 2nd key, it should just be on courseId (a non-composite). If on the first key, for whatever reason I had term wedged into 2nd place, then it would be a valid example.

A better example would be

key (a,b,c),
key (c,b)

The above, among other things, would be useful for queries going against just c, and also b and c together. But not just b.

The Takeaway:

Resist the urge to splatter new indexes into your schema foolishly thinking they will be used. Especially for non-left-most columns not picked up in actual and frequent queries. And certainly not for those just mentioned and wider columns like varchar(100) times several flips in ordering in multiple indexes. All they do potentially is slow down the inserts and updates and offer, many times, zero performance gains in actual queries. So scrutinize it all.

All index choices come at a cost. Only you should make that determination for what is right for your system.

这篇关于测验应用中的答案表数据库设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆